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Introduction 
Bruce  W.  Suter,  AFIT 

The  Air  Force  Institute  of  Technology  (AFIT)  was  pleased  to  host  the  international  work¬ 
shop  on  “The  Role  of  Wavelets  in  Signal  Processing  Applications”.  This  workshop,  held 
12-13  March  1992  at  Wright-Patterson  Air  Force  Base,  Ohio,  brought  together  leading 
researchers  from  both  the  mathematics  and  signal  processing  communities.  As  such,  the 
workshop  provided  a  forum  for  the  interchange  of  ideas,  practical  experiences,  and  recent 
advances.  This  invitation-only  workshop  was  intentionally  kept  small  in  size  in  order  to 
encourage  active  participation  by  all  attendees. 

The  purpose  of  this  workshop  was  to  gain  a  perspective  of  the  role  of  wavelets  in  signal 
processing,  and  to  form  a  vision  of  where  we  should  look  as  a  research  community.  With 
this  in  mind,  the  workshop  sought  to  address  the  following  objectives: 

1.  to  highlight  major  accomplishments  and  limitations  in  the  use  of  wavelets  in  applied 
mathematics  and  signal  processing, 

2.  to  define  the  current  status  of  wavelets  research  in  these  fields, 

3.  to  present  the  challenges  for  future  wavelets  research  in  mathematics  and  in  signal 
processing. 

In  order  to  develop  the  workshop  with  these  objectives  and  to  stimulate  discussions,  brief 
presentations  were  given  by  several  of  the  attendees.  The  sponsor  of  the  workshop,  the 
Air  Force  Ofiice  of  Scientific  Research  (AFOSR),  requested  that  an  official  document  be 
generated  to  commemorate  this  workshop.  As  a  result,  attendees  provided  a  paper  to  be 
included  in  these  proceedings,  since  these  papers  were  not  refereed  by  the  organizers  and, 
it  is  acceptable  for  these  papers  to  be  submitted  to  other  journals  for  publication. 

In  order  to  prepare  the  participants  for  meaningful  discussions,  the  organizers  requested 
that  each  one  answer  the  following  questions,  prior  to  their  arrival,  regarding  an  objective 
comparison  of  wavelets  against  techniques  in  applied  mathematical  analysis  and  signal 
processing; 

1.  For  what  problems  have  wavelets  been  shown  to  be  clearly  superior  to  all  other  known 
techniques? 

2.  For  what  problems  are  the  use  of  wavelets  clearly  inferior  to  all  other  known  tech¬ 
niques? 

3.  In  what  problems  have  wavelets  shown  promise,  but,  to  date  the  wavelets-based 
research  results  obtained  are  not  superior  to  other  known  techniques? 

While  the  participants  were  promised  that  their  responses  would  not  be  published,  their 
contributions  to  these  proceedings  certainly  reflect  them. 
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Summary 

Jon  Sjogren,  AFOSR 

The  AFIT/AFOSR  Workshop  on  “The  Role  of  Wavelets  in  Signal  Processing”,  some¬ 
times  known  “Wavelets  Workshop  11”  or  “WW  11”,  was  unique  in  several  respects.  To 
a  great  extent  it  was  shaped  by  the  course  of  the  previous  year’s  meeting,  “Sympo¬ 
sium  on  Applications  of  Wavelets  to  Signal  Processing”  (WW  I).  Last  year’s  meeting, 
also  held  at  Wright-Patterson  AFB,  was  an  order  of  magnitude  l£u-ger,  having  grown 
considerably  from  the  modest  plans  and  expectations  of  its  organizers. 

The  first  Workshop  was  kicked  off  with  a  superlative  series  of  tutorial  talks,  deliv¬ 
ered  mainly  by  Maj  Greg  Warhola,  USAF.  The  tutorials  were  followed  by  invited  pre¬ 
sentations,  which  served  to  acquaint  the  diverse  audience  with  specific  wavelet  method¬ 
ologies  and  their  applications,  from  spread-spectrum  communications,  electronic  war¬ 
fare,  modeling  of  noise  processes,  feature  identification  in  EEG  and  PET  scans,  to 
speech  and  image  comprecision.  A  banquet  and  panel  discussion  brought  out  some 
fundmental  issues  on  the  meaning  and  interpretation  of  signals.  A  focal  point  was  the 
question  of  how  important  wavelets  “ultimately”  would  turn  out  to  be,  say  in  10  years, 
as  a  tool  and  technique  in  Signal  Processing.  On  a  scale  of  0  to  10,  responses  ranged 
from  “1”  to  “10-i-”.  Electrical  engineering  researchers  tended  to  hold  the  more  skep¬ 
tical  position,  while  mathematicians  (mainly  “wavelet”  mathematicians)  took  a  more 
exuberantly  positive  stand.  Other  observations  included  the  importance  of  Conjugate 
Quadrature  Filters  as  a  precursor  to  analysis  via  wavelets.  Alan  Willsky  of  MIT,  though 
technically  not  part  of  the  panel,  forcefully  succeeded  in  raising  the  consciousness  of 
the  participants  regarding  the  dynamics  of  scale  as  a  variable  of  value  equal  to  time. 
The  festivities  in  1991  were  brought  to  a  close  as  no  one  seemed  to  have  a  satisfactory 
answer  to  an  annoying  but  persistent  question:  “what  means  frequency?” 

A  significant  achievement  of  both  WW  I  and  WW  II  was  that  the  cultural  gap 
between  theoreticians  and  practitioners  wels  closed.  This  was  evidenced  by  the  large 
variance  of  the  “important”  assessments.  Evidently  a  mutual  learning  process  is  accel¬ 
erating. 

A  few  major  “technical”  themes  stand  out  from  the  talks  and  discussions  of 
Wavelet  Workshop  II.  I  mention  two:  (1)  time/scale  analysis  versus  frequency  anal¬ 
ysis  of  signals,  and  (2)  filter-bank  structures  versus  discrete  wavelet  expansions  to 
name  a  few.  As  Leon  Cohen  pointed  out  in  the  lead  talk,  scale  and  “reciprocal  fre¬ 
quency”  are  related  in  a  subtle  way  and  cannot  be  taken  as  identical  for  cill  purposes 
-  the  operator  algebras  that  they  generate  are  dilferent.  On  the  second  point,  it  is  by 
now  generally  understood  that  discrete  wavelet  analysis  can  be  completely  carried  out 
with  banks  of  subband  filters,  down-  and  up-sampling,  etc.  Clearly  there  are  many 
such  filter  bank  configurations  that  may  be  useful  in  engineering  systems,  and  em¬ 
ploy  adaptive  features,  which  take  into  account  varied  distributions  of  particular  signal 
components  and  so  forth,  that  will  not  likely  be  of  much  interest  to  a  mathematical 
theory  of  wavelets.  But  it  is  also  coming  to  be  acknowledged  that  studies  made  by  the 
mathematical-physicist  pioneers  in  wavelets  (such  as  involving  regularity  properties) 
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are  proving  to  be  a  significant  consideration  and  guide  in  filter  design.  The  lunchtime 
talks  and  discussions  brought  out  both  of  these  points  and  other  issues  as  well. 

It  has  been  forcefully  expressed  that  all  these  remarks,  and  especially  those  of 
Prof.  Grossmann  on  the  “History  of  Wavelets”  at  the  Thursday  dinner,  be  collected 
and  included  in  this  volume.  This  is  eminently  desirable,  but  the  use  of  recording 
devices  was  ruled  out  as  too  inhibiting.  In  any  case  only  a  faint  refiection  of  the 
mood,  the  spontaneity,  the  depth  of  experience  that  welled  up  on  those  occasions, 
could  possibly  be  captured  in  print.  We  are  consoled  with  the  thought  that  future 
work,  publications  and  talks  will  owe  something  to  the  rare  conditions  of  insight  and 
mutuality  that  prevailed  for  a  short  time. 

Meirtin  Vetterli  applies  multiresolution  decomposition  in  channel  coding  to  com¬ 
mercial  digital  broadcast.  Transmitting  a  coarse  signal  separately  from  its  details  will 
allow  a  “gracefully  degrading”  received  signal  in  this  future  broadcast  environment. 

Patrick  Flandrin  and  Greg  Wornell  are  both  interested  in  signals  with  self-  similar 
qualities  at  different  scales  (“fractal”).  Noise  with  a  1/ /“  spectral  characteristic  can 
fall  into  this  category.  This  self-similarity  can  be  either  stochastic  (“fractional  Brown¬ 
ian  motion”)  or  deterministic.  The  wavelet  transform  is  close  to  the  ideal  decorrelating 
transform  for  fractional  stochastic  processes  (“Karhunen-Loeve”).  This  gives  a  way  to 
estimate  the  “alpha”  parameter  among  others.  Wornell  shows  how  a  scheme  of  encod¬ 
ing  a  message  at  several  scales  in  a  deterministically  self-  similar  carrier  can  provide 
robustness  in  “difficult”  environments  where  conventional  modulation  falls  short. 

The  talks  of  Alan  Willsky  and  Robert  Tenney  were  related  and  scheduled  sep¬ 
arated  only  by  the  first  lunch  and  discussion.  Alan’s  talk  introduced  the  concept, 
difficult  to  this  writer,  of  stochastic  processes  defined  on  certain  graphs,  where  cor¬ 
relations  can  be  inferred  across  levels,  which  represent  different  magnitudes  of  scale. 
Bob  Tenney  demonstrated  an  interactive  environment  that  has  been  built  around  this 
theoretical  framework:  rapid  estimation  based  on  image  data  for  background/ terrain 
that  possesses  self-  similar  characteristics.  Such  techniques  seem  especially  suitable  for 
reconstructing  “ground  truth”  from  diverse  data  sources  (“sensor  fusion”). 

Stephane  Mallat  sees  a  need  for  adaptive,  possibly  non-linear  transforms  to  com¬ 
plement  existing  methods  in  time-frequency  analysis.  His  “dictionaries”  and  “structure 
books”  allow  more  efficient  representation  of  a  signal  that  has  disparate  types  of  com¬ 
ponent  (sinusoidal,  pulse,  fractal)  than  would,  say,  a  wavelet -packet  basis  on  its  own. 

Ingrid  Daubechies  and  Albert  Cohen  also  approached  the  issue  of  making  wavelet- 
packets  even  more  flexible  and  adaptive.  A  significant  problem  here  has  been  to  define 
wavelet-bases  for  a  finite  interval.  This  is  very  recent  mathematics,  with  substantial 
ramifications  for  functional  analysis.  In  addition,  this  may  well  amount  to  a  conceptual 
breakthrough  for  applications  on  the  order  of  wavelet-packets  themselves  (which  were 
brand-new  only  two  years  ago!) 

Alexander  Grossmann’s  evening  historical  retrospective  was  inspiring  and  set  the 
mood  for  eloquent  reminiscences  by  Thomas  Barnwell  and  P.P.  Vaidyanathan.  Prof. 
Grossmann  recounted  the  early  ideas  of  Morlet  in  seismic  analysis  and  how  collabora¬ 
tion  led  to  connections  with  unitary  operators  and  their  representations. 
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Gregory  Beylkin  pointed  out  some  of  the  unesthetic  features  of  compactly  sup¬ 
ported  wavelets,  such  ajs  lack  of  symmetry  or  shift  invariance.  He  remedies  this  situation 
by  considering  shift-classes  (bringing  in  the  shift  inv^ariance),  and  using  autocorrelation 
functions  of  wavelets  as  a  basis,  replacing  the  unsymmetrical  wavelets. 

In  addition  to  his  exposition  of  the  history  of  filter  banks  for  subband  coding, 
Thomas  Barnwell  explained  how  a  general  setting  using  Lapped  Orthogonal  Transforms 
can  lead  to  analysis/synthesis  systems  with  higher  frequency  resolution  than  available 
with  dyadic  wavelets. 

P.P.  Vaidyaiiathan,  concentrating  in  his  historical  account  on  the  origin  of  the 
concept  of  paraunitaxy  matrices  (which  are  key  to  a  ’polyphase’  analysis  of  filter¬ 
ing/sampling)  gave  a  succinct  theorem  for  filter  bank  convolution,  emulating  the  classi¬ 
cal  Fourier  convolution  theorem.  Characteristically,  P.P.  immediately  applies  this  to  an 
improvement  in  the  coding  gain  that  results  from  a  subband  coding/vector  quantization 
scheme,  which  uses  band-by-band  convolution  in  the  synthesis  step. 

John  Benedetto  is  a  the  master  of  sampling  theory;  the  theory  of  frames  in  a 
Hilbert  space,  motivated  by  Wigner-Ville  (Gabor)  wavelet  methods,  is  seen  to  be  the 
right  tool  for  expressing  “interpolations”  more  generally  than  given  by  the  Whittaker- 
Nyquist  formula,  especially  for  irregular  sampling. 

Bruce  Suter  talked  about  his  work  with  AFIT  colleague  and  Workshop  co-  or¬ 
ganizer  Mark  Oxley  on  variable  length  windows  and  weighted  orthonormal  functions. 
This  approach  puts  windowed  (“short-time”)  Fourier  analysis,  which  ostensibly  is  a  lot 
of  ad  hoc  engineering,  into  a  consistent  and  logical  context. 

The  participants  seems  unanimous  that  this  was  a  rare  meeting.  Months  later 
its  special  circumstances  are  recounted  from  Toulouse  to  II  Ciocco,  from  Berkeley  to 
Oberwohlfach.  It  is  not  possible  to  reproduce  the  lively  discussions  that  took  place, 
but  the  participants  will  all  remember  a  bit  of  what  was  said,  and  by  whom,  and  can 
get  in  contact  for  elaboration.  We  expect  to  see  some  of  the  contentions  and  “resolu¬ 
tions”  that  arose  propagate  to  magazine  articles  and  learned  forums.  The  sine  function 
figures  in  the  solution  of  Maxwell’s  equations,  though  it  is  not  the  only  solution  -  is  it 
thereby  God-given?  (Kronecker’s  assertion  about  the  natural  numbers  notwithstand¬ 
ing.)  More  to  the  point,  light  does  exhibit  sinusoidal  wave  characteristics  (before  we 
knew  all  about  E&:M).  Thus  “spectral  analysis”  lets  you  discover  the  element  Helium. 
Legendre  gave  us  his  polynomials  (or  God  gave  us  Legendre)  so  that  we  can  measure 
the  shape  of  the  earth.  Classical  physics  is  old,  and  such  breakthroughs  will  be  rarer. 
Wavelets  did  not  leap  to  prominence  as  eigenfunctions  of  a  differential  operator  (is 
there  a  Bessel  wavelet?)  but  they  have  a  claim  to  mathematical  natural-ness  (bases 
of  Calderon),  physical  importance  (“coherent  states”),  and  are  in  tune  with  the  Age 
of  Computation  (sparsifying  Toeplitz  operators).  Speaking  from  the  point  of  view  of 
combinatorics/algebra,  workers  in  this  field  as  well  would  have  stumbled  sooner  rather 
than  later  onto  Perfect  Reconstruction  Filter  Banks. 

There  is  surely  no  better  proponent  of  the  mathematics  of  wavelets  than  R. 
Raphael  Coifman.  and  no  better  proponent  of  an  enlightened  scrutiny  of  their  po¬ 
tential  in  signal  processing  than  Alan  V.  Oppenheim.  In  Prof.  Coifman's  after-lunch 
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remarks,  he  showed  himself  keenly  aware  of  the  complexities  and  trade-offs  inherent 
in  applying  a  great  new  idea,  and  came  close  to  declaring  himself  a  born-again  engi¬ 
neer.  Prof.  Oppenheim,  mutatis  mutandis,  acknowledged  a  renewed  appreciation  h.r 
the  applied  mathematics  community’s  contribution  just  over  the  past  year. 

Some  practical  conclusions  seemed  to  be  in  consensus.  Wavelet-based  methods 
of  themselves  cannot  compete  in  speech  compression,  with  the  algorithms  derived  over 
decades  by  the  expert  speech  modelers.  For  certain  particular  compression  tasks  (fin¬ 
gerprint  images?),  nearly  “raw”  wavelet  methods  are  running  very  strongly.  One  must 
bear  in  mind  that  the  various  image  and  speech  communities  have  not  had  time  to 
incorporate  the  best  of  wavelet  methods  into  their  tradecraft.  Preliminary  results  in¬ 
dicate  that  it  is  precisely  a  combination  of  time-scale  (wavelet)  and  frequency  methods 
that  approaches  optimality  for  a  problem  of  importance  such  as  estimating  frequency¬ 
hopping  parameters  in  a  “covert  communications”  scheme. 

If  another  Wavelet  Workshop  is  held  in  a  few  years  (with  a  different  name?),  we 
can  anticipate  some  startling  developments.  On  the  one  hand,  multiresolution  science 
will  be  more  firmly  entrenched  in  Control  Theor}',  Time-Series  modeling,  and  Dynamics. 
On  the  other  hand,  revelations  of  mathematical  appropriateness  will  emerge.  Recently 
it  was  seen  how  the  Napaxst  equations  for  Range-Doppler  imaging  have  a  wavelet 
transform  interpretation.  This  leads  to  optimal  solutions  for  an  emitted  radar  waveform 
in  this  imaging:  they  are  elements  of  a  wavelet  basis!  Another  significant  theoretical 
observation  is  the  utility  of  locally  transforming  in  the  Fourier  domai:\  This  holds 
immediate  promise  for  speeded-up  and  less  expensive  MRI  scanning.  These  exciting 
topics  are  among  those  that  would  be  treated  in  a  future  meeting. 

The  Dayton  Workshop  was  indeed  a  watershed  in  terms  of  consolidating  the 
achievements  of  the  wavelet  community.  We  feel  that  the  Workshop  succeeded  if  it 
has  made  it  easier  for  those  interested  to  tap  into  the  corps  of  tools,  techniques  aind 
understanding  that  transcends  the  collection  of  individual  practitioners. 
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Left  to  right:  Greg  Wornell,  Stephane  Mallat,  Alexander  Grossmann,  Alan  Oppenheim, 
Patrick  Flandrin,  Thomas  Parnwell  III,  Leon  Cohen,  Gregory  Beylkin,  Jon  Sjogren,  Albert 
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ABSTRACTS  OF  THE  WORKSHOP 


A  FILTER  BANK  PERSPECTIVE  ON  DISCRETE  WAVELET  TRANSFORM 

Thomas  Barnwell 
Georgia  Institute  of  Technology 

Discrete  Wavelet  Transform  (DWT)  can  be  modeled  as  special  case  of  a  general  lapped 
transform  based  on  filter  banks.  Design  methodologies  for  such  systems  are  quite  mature.  A 
fundamental  issue  involves  the  performance  degredation  implicit  with  the  DWT  as  compared 
with  other  similar  but  unconstrained  transforms. 

MULTIRESOLUTION  REPRESENTATIONS  USING  THE  AUTO-CORRELATION 
FUNCTIONS  OF  COMPACTLY  SUPPORTED  WAVELETS 

Gregory  Beylkin 

University  of  Colorado  at  Boulder 

In  my  talk  I  will  describe  a  multiresolution  representation  of  signals  using  dilations  and 
translations  of  the  auto-correlation  functions  of  compactly  supported  wavelets.  This  repre¬ 
sentation  was  developed  together  with  Naoki  Saito  of  SchlumbergerDoll  Research.  Although 
the  set  of  dilations  and  translations  of  the  auto-correlation  functions  does  not  form  an  or¬ 
thonormal  bcLsis,  a  number  of  properties  of  these  functions  makes  them  useful  for  signal  and 
image  analysis.  Unlike  wavelet-based  orthonormal  representations,  this  representation  has 
(1)  symmetric  analyzing  functions,  (2)  shift-invariance,  (3)  natural  and  simple  iterative  in¬ 
terpolation  schemes,  (4)  a  simple  algorithm  for  finding  the  locations  of  the  multiscale  edges 
as  zero-crossings.  It  also  leads  to  a  non-iterative  method  for  reconstructing  signals  from  their 
zero- crossings  and  slopes  at  these  zero-crossings. 

WAVELET  BASES  ADAPTED  TO  AN  INTERVAL 
Albert  Cohen 

Ceremade  University  of  Paris  IX  -  Deuphine 

Orthonormal  and  biorthogonal  wavelet  bases  have  found  many  interesting  applications  in 
signal  and  image  processing  (compression)  as  well  as  in  fast  numerical  analysis.  In  these 
applications,  one  always  deals  with  a  signal  or  a  function  which  has  a  finite  expansion  in 
space  and/or  time.  The  problem  arises  then  of  adapting  these  bases,  usually  constructed  for 
the  analysis  on  the  whole  real  line,  to  a  finite  interval,  [0,1]  for  example. 

In  this  talk,  we  shall  describe  and  discuss  the  different  possibilities  to  solve  this  problem 
and  the  related  algorithm.  A  construction  obtained  in  a  joint  work  with  Ingrid  Daubechies 
and  Pierre  Vial  that  provides  with  both  a  sharp  analysis  at  the  borders  and  a  simple  algo¬ 
rithmic  structure  will  be  explained  in  more  details. 
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A  SIMPLE  APPROACH  TO  JOINT  SCALE  REPRESENTATIONS 

Leon  Cohen 

Hunter  C’ollege  and  Graduate  Center  of  CUNY 

Various  authors  have  proposed  representations  involving  scale  and  time  or  scale  and  fre¬ 
quency.  The  distributions  they  obtained  are  very  different  from  each  other.  We  will  show 
that  there  is  a  simple  conceptual  principle  for  obtaining  joint  distributions  and  show  that 
there  have  been  two  different  notions  of  scale  used.  The  approach  presented  clarifies  this 
distinction.  We  obtain  the  distributions  previously  given  in  a  simple  direct  manner.  The 
concept  of  instantaneous  frequencies  is  generalized  to  instantaneous  scale  and  the  uncertainty 
principle  for  scale  is  obtained. 

ADAPTED  WAVEFORM  ANALYSIS 
Ronald  Coifman 
Yale  University 

Local  variable  length  libraries  of  windowed  trigonometric  bases  are  dual  versions  of  wavelet 
and  wavelet  packet  algorithms.  Efficient  parameter  extractions  and  comjiressions  for  sounds 
and  images  can  be  obtained  by  selecting  best  bases  out  of  these  libraries,  in  either  frequency 
or  time  domain. 


NONSEPARABLE  TWO-DIMENSIONAL  WAVELETS 
Ingrid  Daubechies 
Rutgers  University 

Many  two-dimensional  wavelet  applications  use  a  tensor  product  multiresolution  analysis. 
One  can  also  construct  ’’genuine”  (non  tensor  product)  two-dimensional  multiresoluition 
analyses,  possibly  corresponding  to  matrix  dilations.  A  special  case  is  given  by  ’’quincunx” 
subsampling.  The  talk  shows  how  orthonormal  and  biorthogonal  bases  can  be  constructed 
for  this  case,  and  discusses  their  regularity 

TIME-SCALE  ANALYSIS  OF  SELF-SIMILAR  SIGNALS 

P.  Flandrin 

Ecole  Normale  Superieure  de  Lyon 

In  a  number  of  different  physical  situations  (1/f  noises,  turbulence,  texture  analysis,  ...),  we 
are  faced  wdth  fractal  or  multifractal  signals  for  which  it  would  be  desirable  to  have  at  hand 
methods  which  would  allow  one  to  estimate  efficiently  the  corresponding  scaling  laws  or  the 
underlying  self-similarity  structures.  The  recently  introduced  techniques  of  time-scale  anal¬ 
ysis  (wavelet  transforms  and  generalizations)  offer  such  a  possibility,  especially  in  the  case 
of  locally  self-similar  signals,  i.e.  tho.se  for  which  scaling  laws  are  time-dependent.  Start¬ 
ing  from  the  idealized  case  of  fractional  Brownian  motions,  we  will  show  which  advantages 
can  be  gained  from  such  approaches,  either  for  obtaining  almost  Karhunen-Loeve  (doubly 
orthogonal)  representations  via  orthogonal  wavelet  bases,  or  for  defining  general  classes  of 
estimators  (aimed  at  scaling  exponents)  via  bilinear  time-scale  representations  which  gener¬ 
alize  the  usual  \Vigncr-\'’ille  distribution. 
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COMPOSITE  WAVELETS 
Alexander  Crossmann 

Centre  National  de  la  Recherche  Scientifique 

In  some  applications  of  continuous  wavelet  transforms,  it  is  important  to  choose  an  analyzing 
wavelet  with  very  peaked  analyzing  kernel.  There  exist  inequalities  showing  that,  in  a  certain 
sense,  arbitrarily  high  peaking  is  not  possible.  However,  wavelet  reproducing  kernels  are 
precisely  broad-band  ambiguity  functions,  and  there  exists  in  the  radar  literature  an  extensive 
body  of  information  which  shows  the  w-ay  around  these  constraints.  This  information  is 
applied  to  the  construction  of  custommade  analyzing  wavelets.  Both  analytic  and  numerical 
aspects  of  the  subject  are  discussed. 

ADAPTIVE  TIME/FREQUENCY  SIGNAL  REPRESENTATION 

Slephane  Mallat 
Courant  Institute 

We  proved  that  detecting  the  wavelet  transform  local  maxima  allows  us  to  locate  and  char¬ 
acterize  singularities.  A  close  approximation  of  the  signal  can  be  recovered  from  these  local 
maxima.  Such  an  adaptive  sampling  of  the  wavelet  transform  has  applications  in  pattern 
recognition,  compact  image  coding  and  noise  removal.  We  are  extending  this  technique  to 
a  larger  class  of  transforms  that  are  local  in  the  time/frequency  plane.  The  transform  is 
adapted  in  order  to  obtain  a  compact  time/frequency  signal  representation. 

ON  VARIABLE  LENGTH  WINDOWS  AND  WEIGHTED  ORTHONORMAL 

FUNCTIONS 

Bruce  Suter  and  Mark  Oxley 
Air  Force  Institute  of  Technology 

A  new  formulation  is  presented  for  the  analysis  and  synthesis  of  signals.  This  formulation 
is  composed  of  a  variable  length  window  and  a  linear  combination  of  weighted  orthonormal 
functions.  Tradeoffs  in  the  specification  of  window's  are  considered.  A  sinusoidal  example  is 
considered  and  a  fast  algorithm  is  provided  for  its  evaluation. 

ESTIMATION  ON  MULTISCALE  NETWORK  MODELS  MODELS  OF  RANDOM 

FIELDS 
Robert  Tenney 
ALPHATECH,  INC. 

Multiscale  trcestructured  models  of  twodimcnsional  random  processes  are  known  to  lead  to 
exceedingly  efficient  estimation  algorithms.  However,  realizations  of  the  processes  defined 
by  these  models  contain  artifacts  atypical  of  most  imagery,  such  as  discontinuities  along 
quadrant  boundaries.  Augmenting  these  tree  models  w'ith  one-dimensional  multiscale  mod¬ 
els  along  those  boundaries  removes  the  artifacts  in  samples  of  the  process.  However,  it  also 
transforms  the  tree  into  a  network.  This  talk  presents  samples  of  this  class  of  stochastic  pro¬ 
cess,  along  with  a  sketch  of  a  general  estimation  theory  which  shows  that  (1)  the  complexity 
of  the  statistics  to  be  maintained  by  an  estimator  is  independent  of  the  depth  of  the  model, 
and  (2)  update  of  these  statistics  from  a  point  measurement  can  be  accomplished  in  time 
strictly  proportional  to  the  number  of  pixels  at  the  finest  scnle. 
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PARAUNITARY  CONVOLVER 
P.P.  Vaidyanalhan 
California  Institute  of  Technology 

The  maximally  decimated  filter  bank  (perhaps  with  nominiform  decimation)  can  be  regarded 
as  a  transformation  from  time  to  time- frequency.  Examples  of  special  cases  include  the  DFT 
and  the  short  time  Fourier  transform.  The  filter  bank  transformer  has  also  been  regarded 
as  the  discrete-time  wavelet  transformation  by  some  researchers  in  the  community.  Now,  for 
the  case  of  the  traditional  Fourier  transformation,  the  convolution  theorem  is  well-known. 
That  is,  convolution  in  time  is  equivalent  to  multiplication  in  the  transform  domain.  What 
is  the  corresponding  theorem  for  the  case  of  the  filter  bank  transformer?  The  answer  turns 
out  to  be  particularly  simple  for  the  case  of  orthonormal  (paraunitary)  filter  banks,  and  in 
fact  offers  some  practical  advantages  (coding  gain)  in  finite-precision  implementations.  This 
talk  will  address  these  issues. 

WAVELETS,  FILTER  BANKS,  AND  APPLICATIONS 
Martin  Vetterli 
Columbia  University 

Recent  results  on  the  connection  between  wavelets  and  filter  banks  will  be  reviewed.  These 
include  FIR/IIR  constructions,  as  well  as  multidimensional  ones. 

The  question  of  arbitrary  linear  tilings  of  the  time/frequency  or  phase  space  will  be 
addressed,  showing  that  short-time  Fourier  and  wavelet  decompositions  are  two  special  cases. 

Finally,  applications  in  compression  will  be  discussed.  It  will  be  indicated  that  subband 
coding  schemes,  which  are  essentially  identical  to  wavelet  methods,  have  been  well  studied 
over  the  last  15  years  and  achieve  interesting  compression  results,  but  no  spectacular  im¬ 
provements. 

MULTIRESOLUTION  STOCHASTIC  MODELS  AND  FRACTAL  REGULARIZATION 

Alan  Willsky 

Massachusetts  Institute  of  Technology 

In  this  talk  we  describe  our  continuing  effort  to  develop  a  framework  for  modeling  stochastic 
processes  at  multiple  resolutions  and  for  developing  efficient  signal  and  image  processing 
algorithms  based  on  these  models.  We  also  illustrate  the  potential  of  this  approach  in  one 
context,  namely  as  the  bcisis  for  the  ’’fractal  regularization’*  of  ill-posed  image  processing  and 
computer  vision  problems.  Other  potential  areas  of  application  will  also  be  touched  upon. 
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WAVELETS,  SELF-SIMILAR  SIGNALS.  AND  FRACTAL  MODULATION 
Gregory  W  .  Wornell  and  Alan  Oppenlieim 
Massachusetts  Institute  of  I'echnology 

Orthonormal  wavelet  bcises  provide  highly  efficient  i<  presentations  for  several  classes  of  self¬ 
similar  signals.  One  such  collection  of  self-similar  signals  we  refer  to  as  dy-homogeneous 
signals  because  they  generalize  the  well-known  homogeneous  functions.  These  signals,  which 
are  characterized  in  terms  of  a  deterministic  .scale-invariance  relations,  can  be  categorized 
into  two  classes;  energy-dominated  and  power-dominated.  We  present  wavelet-based  con¬ 
structions  of  orthonormal  self-similar  bases  for  the  representation  of  such  signals,  as  well  as 
efficient  discrete-time  algorithms  for  their  manipulation. 

Synthesis  of  dy-homogeneous  signals  is  potentially  important  in  a  range  of  engineering 
applications,  including  remote-sensing  and  conununications.  In  the  commvinications  context, 
we  demonstrate  that  orthonormal  self-similar  bases  lead  to  an  efficient  strategy  for  embed¬ 
ding  information  into  a  self-similar  signal  on  multiple  time  scales.  The  resulting  ’’fractal 
modulation”  strategy  constitutes  an  interesting  paradigm  for  communication  that  is  natu¬ 
rally  suited  for  use  with  noisy  channels  of  unknown  duration  and  bandwidth. 
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A  Time  Domain  View  of  Filter  Banks  and  Wavelets 

Kambiz  Nayebi,  Thomas  P.  Barnwell  III,  and  Mark  J.  T.  Smith 

Digital  Signal  Processing  Laboratory 
School  of  Electrical  Engineering 
Georgia  Institute  of  Technology 
Atlanta,  Georgia  30332 


Abstract 

In  this  paper,  we  consider  a  time  domain  approach  to 
the  reconstruction  problem  for  filter  bank  and  wavelet 
decompositions.  The  development  of  this  time  domain 
reconstruction  theory  has  resulted  in  a  unified  design 
approach  for  uniform,  nonuniform,  low  delay,  and  effi¬ 
cient  analysis-synthesis  systems.  The  theory  is  based 
on  FIR  filters  and  can  also  be  applied  to  the  design  of 
perfect  reconstruction  systems  based  on  wavelets  for 
multi-resolution  analysis-synthesis  systems. 

In  this  paper,  the  procedure  for  the  design  of  gen¬ 
eral  decomposition  and  reconstruction  systems,  called 
Generalized  Lapped  IVansforms  (GLT),  is  discussed. 
GLT’s~  include  many  classical  transforms  and  the 
discrete- time  wavelet  transform  (DTWT)  as  special 
cases.  The  new  design  procedure  is  used  to  design 
wavelets  and  DTWT  systems.  Because  of  the  general¬ 
ity  of  the  framework,  regularity  and  phase  conditions 
can  easily  be  imposed  on  the  wavelet.  Also,  because 
the  design  procedure  can  be  used  to  design  nonuni¬ 
form  band  systems,  systems  with  higher  frequency  res¬ 
olution  than  dyadic  wavelet-based  systems  can  be  de¬ 
signed  and  realized.  A  number  of  design  examples  are 
included  in  the  paper. 

1  Introduction 

Signal  analysis  and  reconstruction  based  on  filter 
banks  has  been  a  popular  field  of  research  over  the 
last  decade.  Many  different  time-frequency  represen¬ 
tations  can  be  interpreted  and  implemented  as  fil¬ 
ter  bank  structures.  For  example,  the  discrete  short- 
time  Fourier  transform  (DSTFT),  which  has  long  been 
used  to  generate  spectrograms,  can  be  implemented 
in  a  filter  bank  structure  [Ij.  The  discrete  wavelet 
transform  is  also  typically  implemented  using  a  tree- 
structured  filter  bank  system  [2].  From  a  more  general 
point  of  view,  filter  bank  systems  are  time-frequency 
block  transforms  in  which  the  transformation  matrix 
is  M  X  N  and  M  <  N  (where  N  is  the  length  of  the 
longest  system  filter).  From  this  viewpoint,  filter  bank 

'This  paper  wa.s  published  in  the  proceedings  of  the  25th 
Asilomar  Conference  on  Signals.  Systems  and  Computers, 
November  1 991 . 


systems  include  classical  block  transforms  (DSTFT, 
Discrete  Cosine  Transform,  etc.)  and  Lapped  Orthog¬ 
onal  Transforms  (LOT’s)  [3]  as  special  cases. 

Recent  research  on  the  design  of  perfect  reconstruc¬ 
tion  analysis-synthesis  systems  has  produced  some  sig¬ 
nificant  results  [3-8].  The  invention  of  Conjugate 
Quadrature  Filters  (CQF’s),  perfect  reconstruction  fil¬ 
ter  banks  with  N  =  2M,  lossless  analysis-synthesis 
systems,  perfect  reconstructing  modulated  analysis- 
synthesis  filter  banks,  and  the  design  of  nonuniform 
filter  banks  with  rational  sampling  rate  changes  are 
among  the  significant  achievements  within  the  last 
decade. 

The  time  domain  formulation  introduced  in  [6,9,10] 
provides  a  unified  framework  for  the  design  of  a  wide 
variety  of  analysis-synthesis  systems  which  includes 
all  known  structures  based  on  FIR  filter  banks.  In 
the  past,  almost  all  decomposition-reconstruction  sys¬ 
tems  based  on  filter  banks  where  designed  by  analysis. 
This  means  that  the  design  was  accomplished  by  a 
thorough  and  complete  analysis  of  the  system  of  in¬ 
terest.  In  the  new  time  domain  design  methodology, 
a  complete  analysis  of  the  system  is  not  required  in 
order  to  accomplish  the  design.  In  fact,  the  only  infor¬ 
mation  that  is  fundamentally  necessary  in  the  design 
process  is  the  desired  system  structure.  Obviously,  for 
the  design  to  be  possible,  the  system  structure  must 
be  consistent  with  the  desired  frequency  resolution. 
This  design  approach  provides  an  elegant  and  pow¬ 
erful  procedure  for  designing  systems  which  are  not 
completely  understood  analytically.  For  example,  it 
was  not  known  that  it  was  possible  to  design  low  and 
mininium  delay  systems  with  perfect  reconstruction 
until  they  were  designed  using  the  time  domain  de¬ 
sign  technique  [11]. 

From  a  signal  analysis  viewpoint,  analysis-synthesis 
systems  are  used  to  project  the  input  signal  onto  dif¬ 
ferent  signal  subspaces  each  of  which  may  have  dif¬ 
ferent  time  and/or  frequency  characteristics.  Enough 
information  is  preserved  in  signal  subspaces  so  that 
the  signal  can  be  reconstructed.  It  should  be  clear 
that  the  class  of  analysis-synthesis  systems  based  on 
filter  banks,  which  is  perhaps  better  called  the  class  of 
fixed  overlapping  block  transforms,  is  very  large  and 
includes  many  of  the  well  known  transforms  as  spe¬ 
cial  cases.  It  is  also  obvious  that  there  are  infinitely 
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Figure  1;  A  stmple  block  diagram  of  the  generalized 
lapped  transform. 

many  unique  time-frequency  decompositions  that  can 
be  achieved  using  nonuniform  band  systems.  These 
transforms  can  be  considered  to  be  an  invertible  map¬ 
ping  of  the  signal  onto  the  time-frequency  plane.  The 
resolution  in  the  time-frequency  plane  depends  of  the 
transformation  characteristics. 

The  ability  of  the  time  domain  approach  to  de¬ 
sign  systems  with  arbitrary  nonuniform  decomposi¬ 
tions  makes  it  possible  to  create  a  very  large  family 
of  overlapped  block  transforms  with  any  desired  time 
and  frequency  resolution  (within  the  constraints  of  the 
uncertainty  principle).  Because  of  its  ability  to  de¬ 
sign  fixed  lapped  transform  systems,  and  because  the 
DTWT  is  a  fixed  lapped  transform,  it  can  be  used  to 
design  wavelets  with  constrained  regularity  and  phase 
properties.  The  relationship  between  compactly  sup¬ 
ported  wavelets  and  filter  banks  is  described  in  recent 
literature  including  [12,2].  In  this  next  section,  a  gen¬ 
eral  view  of  analysis-synthesis  systems  as  lapped  trans¬ 
forms  is  presented.  It  is  shown  that  the  STFT  and  the 
DTWT  are  special  C2ises  of  this  general  transform.  In 
the  remainder  of  the  paper,  a  summary  of  the  time 
domain  framework  and  its  applications  to  the  design 
of  wavelets  and  DTWT  systems  are  presented. 

2  Generalized  Lapped  Trans¬ 
forms 

In  the  DSTFT,  the  signal  is  typically  viewed  through 
a  lowpass  analysis  window  h(n).  To  analyze  the  entire 
signal,  the  window  is  shifted  in  time  and  the  DFT  of 
the  windowed  signal  is  computed.  In  the  resulting  rep¬ 
resentation,  the  signal  is  mapped  onto  time-frequency 
plane  in  which  the  time  and  frequency  resolution  is 
fixed.  The  resolution  in  the  time-frequency  plane  is 
determined  by  the  amount  of  the  window  shift,  the 
shape  of  the  window  function  and  the  length  of  the 
DFT.  From  a  filter  bank  point  of  view,  the  DSTFT  is 
the  decimated  output  of  a  bank  of  complex  modulated 


filters  whose  prototype  filter  impulse  response  is  the 
analysis  window  h{n).  From  a  signal  decomposition 
point  of  view,  the  DSTFT  may  be  interpreted  as  the 
decomposition  of  the  signal  into  different  sub-signals 
using  a  set  of  basis  vectors  and  their  time-shifted  ver¬ 
sions.  In  the  DSTFT,  these  basis  vectors  are  defined 
by  the  analysis  window  and  its  frequency  modulated 
versions. 

The  DTWT  offers  an  alternative  time-frequency 
(known  as  time-scale)  representation  in  which  the  res¬ 
olution  in  the  time-frequency  plane  is  not  uniform. 
As  opposed  to  the  STFT  in  which  th<  signal  is  viewed 
from  a  window  of  fixed  size  and  shape,  the  DTWT  is 
based  on  a  series  of  different  windows  which  are  re¬ 
lated  to  a  continuous  wavelet  function  w{t). 

Each  window  is  of  different  length  and  corresponds 
to  a  bandpass  filter  with  a  center  frequency  at 
for  j  —  0, 1, . . .  In  the  analysis  of  the  signal,  the  sig¬ 
nal  is  decomposed  into  different  sub-signals  at  dif¬ 
ferent  resolutions.  From  a  sliding  window  perspec¬ 
tive,  the  sliding  rate  of  different  windows  depends  on 
their  length.  Thus,  shorter  windows  are  moved  more 
frequently  than  longer  ones.  This  results  in  a  time- 
frequency  representation  that  has  high  temporal  reso¬ 
lution  in  high  frequencies  and  low  temporal  resolution 
(high  frequency  resolution)  in  low  frequencies.  From  a 
filter  bank  perspective,  the  Dl'WT  is  an  octave-band 
tree-structure  in  which  the  low  frequency  band  is  fur¬ 
ther  divided  into  high  and  low  bands  and  downsam¬ 
pled. 

Both  the  DSTFT  and  the  DTWT  can  be  con¬ 
sidered  to  be  special  cases  of  a  more  general  time- 
frequency  transformation  which  we  will  refer  to  as  the 
Generalized  Lapped  Transform  (GLT).  The  GLT  in¬ 
cludes  DSTFT’s,  uniform  band  filter  banks,  DSTCTs, 
lot’s,  and  DTWT’s  as  special  cases.  These  are  all 
examples  of  fixed  lapped  transforms,  where  the  basis 
vectors  do  not  change  with  time.  The  GLT  also  can 
include  systems  in  which  the  basis  vectors  change  with 
time.  The  time  domain  design  procedure  can  be  used 
to  design  any  GLT. 

In  the  generalized  lapped  transform,  the  character¬ 
istics  of  analysis  windows  and  the  resolution  in  the 
time-frequency  plane  are  only  restricted  by  the  re¬ 
construction  requirement  of  the  input  signal  (equiv¬ 
alently,  the  invertibility  of  the  transformation).  As  in 
the  DSTFT  and  the  DTWT,  the  GLT  can  be  imple¬ 
mented  in  a  filter  bank  structure  whose  general  form  is 
presented  in  Figure  1.  In  such  a  transform,  the  input 
signal  is  decomposed  into  a  set  of  sub-signals  with  dif¬ 
ferent  resolutions.  To  achieve  such  a  multi-resolution 
time-frequency  representation  of  the  signal,  the  signal 
is  passed  through  a  set  of  upsamplers  before  the  trans¬ 
formation  is  applied.  The  redundancy  of  the  represen¬ 
tation  is  reduced  by  downsampling  the  sub-signals  to 
appropriate  rates.  The  basis  vectors  are  designed  to 
have  the  required  time  and  frequency  characteristics. 
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CQF-8,  (b)  Scaling  function  <j>i,  (c)  A  closer  view  of 
<l>o  and  <j>i  in  the  range  [S.5,4-7]. 

3  Time  Domain  Framework 

The  basic  idea  behind  the  time  domain  formulation 
of  analysis-synthesis  structures  is  to  determine  the  set 
of  necessary  and  sufficient  conditions  for  exact  recon¬ 
struction  of  the  input  at  the  system  output.  These 
conditions  are  expressed  in  a  proper  matrix  product 
form  which  is  used  in  the  design  procedure.  The  con¬ 
ditions  can  be  expressed  as 

AiSi=hi  i  =  (1) 

where  T  is  the  shift-invariance  period  of  the  system. 
The  elements  of  matrix  Aj  are  related  to  the  analy¬ 
sis  basis  vectors  and  elements  of  s,  are  related  to  the 
synthesis  basis  vectors  [13].  The  vector  b,-  is  constant. 

After  expressing  the  reconstruction  conditions  of  the 
system  (i.e.  the  invertibility  conditions  of  the  trans¬ 
form)  in  a  matrix  form,  a  cost  function  is  defined  as 

T-l 

c  =  Y,  IIA.S,  -  b,||.  (2) 

1=0 

This  cost  function  is  minimized  (and  brought  to  zero 
for  perfect  reconstruction,  if  possible)  using  an  opti¬ 
mization  routine.  Other  system  constraints  which  can 
not  be  directly  incorporated  into  the  formulation  (such 
as  frequency  responses)  can  also  be  added  to  the  cost 
function  for  minimization.  In  the  next  section,  the 
design  approach  is  applied  to  the  design  of  compactly 
supported  wavelets  and  DTVVT  systems. 


Figure  3:  (a)  Wavelet,  V>o,  generated  from  CQF-8,  (b) 
Wavelet  il>\ . 


Figure  4:  (a)  Synthesis  scaling  function,  (b)  Synthesis 
wavelet,  which  corresponds  to  and  ipi. 

4  Wavelet  Design 

In  this  section,  the  flexibility  of  the  time  domain 
design  procedure  in  designing  compactly  supported 
wavelets  is  illustrated.  Examples  are  designed  to  ex¬ 
hibit  the  flexibility  of  the  design  approach  in  imposing 
regularity  condition  on  the  wavelet,  low  delay  or  lin¬ 
ear  phase  property  on  the  wavelet,  and  their  effects 
on  the  wavelet.  Both  orthogonal  and  biorthogonal 
wavelets  are  designed  and  presented.  To  design  or¬ 
thogonal  wavelets,  the  synthesis  filter  coefficients  are 
chosen  to  be  the  time-reversed  version  of  the  analysis 
filters.  In  designing  biorthogonal  wavelets,  the  syn¬ 
thesis  filters  are  chosen  to  be 

9o{n)  =  /.,(n)(-l)"  (3) 

g,{n)  =  -ho(n)(-l)".  (4) 

First,  we  consider  the  imposition  of  regularity  con¬ 
straints  on  the  wavelet.  Figure  2a  shows  the  generated 
scaling  function  from  the  8-tap  Conjugate  Quadrature 
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Filter  (CQF-8).  This  scaling  function  is  referred  lo  as 
^0  in  this  paper.  The  relative  irregularity  of  this  func¬ 
tion  and  its  corresponding  wavelet  (xl>o)  shown  in  Fig¬ 
ure  3a  is  obvious.  Scaling  functions  are  generated  by 
iterating  the  system  lowpass  filter  eight  times.  To  im¬ 
pose  regularity  on  the  wavelet,  a  zero  at  z  =  —  1  is  im¬ 
posed  on  the  lowpass  analysis  filter,  no(n).  To  impose 
similar  regularity  on  the  synthesis  side,  a  zero  at  2  =  1 
is  imposed  on  the  highpass  filter,  /ii(n).  Figure  2b 
shows  the  scaling  function  of  this  system,  referred  to 
as  ^1,  which  is  much  smoother  than  <l>o.  Figure  3b 
shows  the  corresponding  relatively  regular  wavelet,  ^>1  • 
Figure  2c  shows  a  closer  view  of  both  the  scaling  func¬ 
tions  in  the  range  [3. 5, 4. 7].  Since  the  synthesis  wavelet 
is  different  from  the  analysis  wavelet.  Figure  4  show 
the  synthesis  scaling  function  and  wavelet.  To  make 
the  wavelet  even  more  regular,  two  zeros  are  located 
at  2  =  — 1  for  ho{n)  and  two  zeros  at  2  =  1  for  /»i(n). 
The  resulting  scaling  function  and  wavelet  are  referred 
to  as  <j>2  and  V'2i  respectively. 

To  compare  the  regularity  properties  of  the  three 
wavelets,  the  spectra  of  the  three  scaling  functions, 
4>o(w),  ^i(w),  and  4>2(w),  arc  compared.  In  Figure  5, 
4>o  and  <l>i  are  shown  side-by-side  for  comparison.  As 
seen  from  this  figure,  the  high  frequency  components 
of  <I>i  are  significantly  smaller  than  those  of  4>o  which 
results  in  a  smoother  wavelet  function.  Figure  6  com¬ 
pares  4>i  and  ^>2.  As  seen  in  this  figure,  since  4>2 
is  generated  from  a  lowpass  filter  with  two  zeros  at 
2  =  —  1,  it  has  a  higher  decay  rate  than  <I'i. 

To  understand  the  effects  of  imposing  zeros  at  2  = 
—  1  to  achieve  regularity.  Figure  7  shows  the  log  mag¬ 
nitude  response  of  lowpass  filters  used  to  generate  4>o 
and  01.  Having  the  same  passband  and  transition 
characteristics,  a  few  dB  loss  in  the  stopband  attenu¬ 
ation  is  the  result  of  imposing  one  of  the  zeros  to  be 
at  —1. 

To  exhibit  the  flexibility  of  the  design  approach, 
three  more  design  examples  are  shown.  Figure  8  shows 
the  analysis  and  the  synthesis  wavelets  with  linear 
phase.  These  correspond  to  a  two-band  system  with 
16-tap  system  filters  and  two  zeros  at  2  =  —1  for  both 
analysis  and  synthesis  sections.  Orthogonal  wavelets 
can  also  be  easily  designed.  Figure  9a  shows  the  scal¬ 
ing  function  and  Figure  9b  shmvs  the  corre.sponding 
orthogonal  wavelet  function.  The  regularity  in  this 
case  in  achieved  by  imposing  two  zeros  at  2  =  —  1 
for  ho{n).  In  the  last  example,  the  design  approach 
is  applied  to  design  low  delay  wavelet  decomposition- 
reconstruction  systems.  In  these  systems,  both  analy¬ 
sis  and  synthesis  wavelets  have  low  group  delay  which 
results  in  a  lower  total  system  delay.  Figure  10  shows 
the  analysis  and  synthesis  wavelets  of  such  a  system. 
These  wavelets  correspond  to  a  two-band  system  with 
16-tap  filters  and  only  10  samples  of  system  delay. 
This  delay  is  normally  15  samples  for  CQF  and  QMF 
based  systems.  In  this  example,  two  zeros  are  imposed 
at  2  =  —1  for  ho(n)  and  at  2  =  1  for 


5  Conclusion 

In  this  paper,  the  generalized  lapped  transform  in  its 
general  form  is  introduced.  It  is  shown  that  the  many 
well  known  transforms,  including  the  wavelet  trans¬ 
form,  are  special  cases  of  the  GLT.  The  time  domain 
formulation  of  the  general  analysis-synthesis  systems 
based  on  FIR  filters  is  used  to  design  wavelets  with 
different  constraints.  Constraints  included  the  regu¬ 
larity,  a  linear  phase,  a  low  delay,  and  the  orthogonal¬ 
ity.  All  these  can  be  imposed  using  the  same  design 
procedure. 

References 

[1]  M.  R.  Forlnoff,  “Time-frequency  representation  of 
digital  signals  and  systems  based  on  sliort-time 
fourier  analysis,”  IEEE  Tmnsaclions  ASSP,  pp.  55- 
69,  February  1980. 

[2]  M.  Vetterli  and  C.  llerley,  “Wavelets  and  Filter 
Banks:  Relationships  and  New  Results,”  Proceedings 
ICASSP,  pp.  1723  -  1726,  April  1990. 

[3]  11.  S.  Malvar  and  D.  11.  Staelin,  “The  LOT:  Trans¬ 
form  Coding  without  Blocking  Effects,”  IEEE  Trans¬ 
actions  on  ASSP,  pp.  553-559,  April  1989. 

[4]  M.  J.  T.  Smith  and  T.  P.  Barnwell,  “Exact  reconstreu- 
tion  techniques  for  tree-structured  subband  coders,” 
IEEE  Transaction  ASSP,  pp.  434-411,  June  1986. 

[5]  J.  P.  Princen  and  A.  B.  Bradley,  “Analysis/Synthesis 
Filter  Bank  Design  Based  on  Time  Domain  Aliasing 
Cancellation,”  IEEE  Transactions  ASSP,  vol.  ASSP- 
34,  October  1986. 

[6]  K.  Naycbi,  T.  P.  Barnwell,  and  M.  J.  T.  Smith,  “The 
Time  Domain  Analysis  and  Design  of  Exactly  Recon¬ 
structing  FIR  Analysis/Synthesis  Filter  Banks,”  Pro¬ 
ceedings  ICASSP,  pp.  1735-1738,  April  1990. 

[7]  P.  P.  Vaidyanathan,  “Theory  and  design  of  M  channel 
maximally  decimated  QMF  with  arbitrary  M,  having 
perfect  reconstruction  property,”  IEEE  Transactions 
on  ASSP,  April  1987. 

[8]  M.  Vetterli  and  D.  L.  Gall,  “Perfect  Reconstruc¬ 
tion  FIR  Filter  Banks:  Lapped  Transforms,  Pseudo 
QMF’s  and  Paraunitary  Matrices,”  Proceedings  IS- 
CAS,  pp.  2249-2253,  1988. 

[9]  K.  Nayebi,  T.  P.  Barnwell,  and  M.  J.  T.  Smith,  “Gen¬ 
eral  Time  Domain  Analysis  and  Design  Framework  for 
Exacly  Reconstructing  FIR  Analysis/Synthesis  Fil¬ 
ter  Banks,”  Proceedings  ISCAS,  pp.  2022-2025,  May 
1990. 

[10]  K.  Nayebi,  T.  P.  Barnwell,  and  M.  J.  T.  Smith,  “Time 
domain  filter  bank  analysis:  A  new  design  theory,”  To 
be  published  Trans,  on  ASSP.,  June  1992. 

[11]  K.  Nayebi,  T.  P.  Barnwell,  and  M.  J.  T.  Smith,  “De¬ 
sign  of  low  delay  FIR  analysis-synthesis  filter  bank 
systems,”  Proc.  Conj.  on  Information  Sciences  and 
Systems,  1991. 

[12]  I.  Duabechies,  “Orthonormal  bases  of  compactly  sup¬ 
ported  wavelets,”  Comm,  in  Pure  and  Applied  Math., 
vol.  41,  pp.  909-996,  1988. 

[13]  K.  Nayebi,  T.  P.  Barnwell,  and  M.  J.  T.  Smith,  “The 
design  of  perfect  reconstruction  nonuniform  band  fil¬ 
ter  banks,”  Proceedings  ICASSP,  1991. 


4 


AFIT/AFOSR  Wavelets  Workshop  19 


Figure  5:  Magnitude  of  ^q(u))  (left)  and  (right) 

in  dB. 


Figure  8:  (a)  Analysis  wavelet,  (b)  Synthesis  wavelet 
with  linear  phase  generated  from  16-tap  filters  with  two 
zeros  at  z  =  —1. 


Figure  6:  Magnitude  (left)  and  4'2(w)  (right) 

in  dB. 


Figure  9:  (a)  Scaling  function,  (b)  Orthonormal 

wavelet  generated  from  16-tap  orthogonal  filters  with 
two  zeros  at  z  =  —i. 


Figure  7;  Frequency  response  of  the  lowpass  analysis 
filler  with  no  zeros  at  z  =  —\  (full  line)  and  one  zero 
a<  ;  =  —  1  (dashed  line). 


Figure  10:  (a)  Analysis  low  delay  wavelet,  (b)  Synthe¬ 
sis  low  delay  wavelet  generated  from  16-tap  filters  of  a 
10  sample  delay  system  with  two  zeros  a<  ;  =  —  1 
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Abstract.  Irregular  sampling  expansions  are  proved  in  an  elementary  way  by  an  analysis  of 
the  inverse  frame  operator.  The  expansions  are  of  two  dual  types:  in  the  first,  the  sampled 
values  at  irregularly  spaced  points  are  the  coefficients;  in  the  second,  the  sequence  of  sampling 
functions  are  irregularly  spaced  translates  of  a  single  sampling  function.  The  results  include 
regular  sampling  theory  as  well  as  the  irregular  sampling  theory  of  Paley-Wiener,  Levinson, 
Beutler,  and  Yao-Thomas.  The  use  of  frames  also  gives  rise  to  a  new  interpretation  of  aliasing. 


1  Introduction. 

The  subject  of  ssunpling,  whether  as  method,  point  of  view,  or  theory,  weaves  its  funda¬ 
mental  ideas  through  a  panorama  of  engineering,  mathematical,  and  scientific  disciplines. 
Sampling  is  so  pervasive  that  excellent  expositions  and  surveys  abound;  [BSS]  and  [Hi2] 
are  two  such  papers  that  are  particularly  appropriate  for  our  perspective.  Alas,  our  con¬ 
tributions  focus  on  an  important  result  by  Kothe  [K]  (1936),  on  a  new  look  at  Duffin  and 
Schaeffer’s  theory  of  frames  [DS]  (1952)  in  light  of  the  emergence  of  wavelet  theory,  and 
on  effective,  elementary,  and  unifying  methods  for  irregular  sjunpling  in  terms  of  frames. 

Kothe  was  the  first  to  prove  that  all  bounded  unconditional  bases  are  equivalent  in  a 
given  separable  Hilbert  space.  An  explanation  of  this  result  and  its  relationship  with  the 
theory  of  frames  are  the  content  of  Theorem  2.5.  Section  2  presents  a  crisp  compendium 
of  frame  theory  with  Theorem  2.5  as  its  focal  point.  To  titillate  the  reader  during  this 


’The  first  named  author  is  also  Professor  of  Mathematics  at  the  University  of  Maryland,  College  Park, 
MD  20742.'  His  work  was  supported  in  part  by  DARPA  Contract  DAAH01-90-C-0667. 
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dry  compilation,  we’ve  pointed  out  yet  another  “first”  by  Vitali  [V]  in  Remark  2.3.  The 
technictil  device  we  extricate  from  Section  2  is  the  inverse  frame  operator  S~^  for  weighted 
Fourier  frames  associated  with  the  lattice  {(na,m6)},  e.g.,  Definition  2.6  and  Theorems 
2.7  and  2.8;  and  this  operator  is  oin  basic  tool  in  proving  regular  sampling  theorems. 

Section  3  is  devoted  to  classical  regular  sampling  expansions  of  the  form, 

(1)  /(()=  f;  f(nT)>(t-uT), 

n=*oo 

where  T  >  0  is  the  sampling  rate,  {/(nT)}  is  tne  set  of  regularly  sampled  values  of 
the  signal  /,  and  s  is  the  sampling  fimction.  The  point  of  Section  3  is  to  prove  (1) 
quickly  in  terms  of  frame  decompositions.  Regular  sampling  involves  orthonormal  bases 
of  exponentials,  and  S~^  is  used  as  a  multiplier,  e.g..  Theorem  3.1  emd  Theorem  3.3. 
An  important  consequence  of  this  line  of  thinking  is  a  new  interpretation  and  explanation 
of  aliasing  in  terms  of  frames,  e.g..  Section  3.2. 

Since  oxn  main  goal  is  to  prove  irregular  sampling  expansions  analogous  to  (1),  we  de¬ 
velop  the  theory  of  weighted  Foxirier  frames  associated  with  irregular  “lattices”  {(on,  ^>m)} 
in  Section  4. 

Our  results  on  irregular  sampling  are  the  subject  of  Sections  5  and  6.  In  Section  5, 
irreguleirly  sampled  values  of  /  are  used  in  the  expansions  analogous  to  (1).  In  Section  6, 
irregular  translates  of  a  single  sampling  function  are  used  in  the  expansions  analogous  to 
(1).  These  two  expansions  are  dual  in  the  context  of  frame  theory  in  a  way  that  is  explained 
in  the  text.  The  resvilts  in  Section  5  use  special  frames  associated  with  Kothe’s  work, 
and  include  the  completeness  and  sampling  theory  of  Paley- Wiener,  Levinson,  Beutler, 
aind  Yao-Thomais.  The  results  in  Section  6  use  ordinary  frames,  and  lead  to  an  algorithm 
providing  insight  into  the  role  of  irregularly  sampled  values  for  the  expansions  of  this 
section.  The  irregular  sampling  of  Section  5  involves  bounded  unconditional  bases,  and 
is  used  in  terms  of  biorthonormality,  cf.,  our  remark  above,  about  Section  3,  on  the 
role  of  S~^ . 

We  indicated  at  the  outset  that  sampling  ideas  have  diverse  theoretical  foundations  and 
catholic  applicability.  As  such,  the  sequel  to  this  paper  has  two  components.  First,  there  is 
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a  critical  comparison  in  Part  II  of  other  approaches  to  irregular  sampling,  cf.,  the  analysis 
by  Feichtinger  and  Grochenig  [FG].  Second,  as  regards  applicability,  Part  II  contains 
results  dealing  with  aliasing,  the  algorithm,  stability,  and  higher  dimensions,  all  in  the 
context  of  our  frame  theoretic  approach.  We  have  already  indicated  our  technical  direction 
for  aliasing  and  the  algorithm  in  Sections  3  and  6,  respectively.  In  Part  II,  the  aliasing 
method  is  fully  developed  for  the  irregular  sampling  case,  and  an  error  analysis  is  conducted 
on  the  algorithm  for  various  trimcations  of  the  inverse  frame  operator.  Our  approach  to 
stability  builds  on  the  ideas  of  Yao  and  Thomas  [YT],  and  ties  in  with  the  results  of 
Beurling  and  Malliavin  [BM]  and  Landau  [La].  Our  approach  to  higher  dimensions  is 
direct. 

Besides  the  usual  notation  in  analysis  as  found  in  the  books  by  HormEinder  [Hb],  Schwartz 
[S],  and  Stein  and  Weiss  [SW],  we  shall  use  the  conventions  and  notation  described  at  the 
end  of  the  paper. 

Finally,  in  this  paper  we  have  only  proved  convergence  in  the  norm.  All  of  our  results 
have  been  proved  for  other  modes  of  convergence,  and  details  are  fotmd  in  [H].  Also,  we 
have  dealt  exclusively  with  bandlimited  sampling  functions. 


2  Riesz  bases  and  frames. 

2.1  DEFINITION,  a.  A  sequence  {^„}  C  ff,  a  separable  Hilbert  space,  is  a  frame  if  there 
exist  A,B  >  0  such  that 

VfsH,  All/ll’ <^|(/,,„)P<B||/|^ 

where  (  ,  )  is  the  inner  product  on  H  and  the  norm  of  f  E  H  is  ||/||  =  (/, /)^^^.  A  eind 
B  are  the  frame  bounds,  and  a  frame  {p„}  is  tight  ii  A  —  B.  A  frame  is  exact 
if  it  is  no  longer  a  frame  whep  emy  one  of  its  elements  is  removed.  Clearly,  if  is  an 
orthonormed  basis  of  H  then  it  is  a  tight  exact  frame  with  A  =  B  =  1. 

b.  The  frame  operator  of  the  frame  {^n}  is  the  function  S  :  H  H  defined  as 

Sf  -  T,{f^9n)9n- 
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The  theory  of  frames  is  due  to  Duffin  and  Schaeffer  [DS]  in  1952.  Expositions  include 
[Y]  and  [HW],  the  former  presented  in  the  context  of  non-harmonic  Fourier  series  md  the 
latter  in  the  setting  of  wavelet  theory. 

2.2  Theorem.  Let  {y„}  C  H  be  a  frame  with  frame  bounds  A  and  B. 

a.  S  is  a  topological  isomorphism  with  inverse  S~^  :  H  —*  H.  is  a  frame  with 

frame  bounds  and  A~^ ,  and 

V  /  €  ^,  /  = 

The  first  expansion  is  the  frame  expansion  and  the  second  is  the  dual  frame  expansion. 

h.  If  {ffn}  is  tight,  l|p„|l  =  1  for  all  n,  and  A  =  B  =  1,  then  {^n}  is  an  orthonormal 
basis  of  H. 

c.  If  {^n}  is  exact,  then  {^n}  and  {<S“*5n}  are  biorthonormal,  i.e. 

V  171,  n  ,  (Smy  S  gn)  —  ^mn- 

2.3  Remark.  We  comment  on  part  b  because  it  is  surprisingly  useful  and  because  of  a 
stronger  result  by  Vitali  (1921)  [V]. 

To  prove  b  we  first  use  tightness  and  A  =  1  to  write, 

l|j™ll’  =  ll<;n.||^+ 

n^m 

and  obtsun  that  {</„}  is  orthonormal  since  each  |lff„||  =  1.  To  conclude  the  proof  we  then 
invoke  the  well-known  result:  if  {</n}  Q  is  orthonormal  then  it  is  an  orthonormal  basis 
of  H  if  and  only  if 

v/€/r,  imp  =  X^|(/,s„)|^ 

In  1921,  Vitali  proved  that  an  orthonormal  sequence  {^„}  C  L^[a,  6]  is  complete,  and  so 
{^n}  is  an  orthonormal  basis,  if  and  only  if 

(2.1)  Vt€[a,6],  9n{u)du\'^  =  t  -  a. 

For  the  case  H  =  L^[a,b],  Vitali’s  result  is  stronger  than  part  b  since  (2.1)  is  tightness 
with  A  =  1  for  fvmctions  /  = 

Other  remarkable  and  important  contributions  by  Vitali  are  highlighted  in  [B]. 
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2.4  Definition.  Let  H  be  a  separable  Hilbert  space.  A  sequence  C  H  is  a  Schauder 
basis  or  basis  of  H  if  each  f  €  H  has  a  unique  decomposition  /  =  Yl^n{f)9n-  A  basis 
{^n}  is  an  unconditional  basis  if 

3  C  such  that  V  F  CZ,  where  card  F  <  oo,  and 
V  6„,  c„  €  C,  where  n  €  F  and  |fc„|  <  |c„|, 

II  x;  s  <^11 E 

neF  n€F 


An  unconditional  basis  {y„}  is  bounded  if 

3  A,  H  >  0  such  that  V  n,  A  <  ||^n||  ^  B. 

Sepzu-able  Hilbert  spaces  have  orthonormal  bases,  and  orthonormal  bases  tire  bounded 
unconditional  bases. 

Kothe’s  result  mentioned  in  Section  1  is  the  implication,  b  implies  c,  of  the  following 
theorem.  The  implication,  c  implies  b,  is  straightforward;  and  the  equivalence  of  a  and  c 
is  found  in  [Y,  pp.  188-189]. 

2.5  Theorem.  Let  H  be  a,  separable  Hilbert  space  and  let  {^n}  Q  H  be  a  given  sequence. 
The  following  are  equivalent: 

a.  {^n}  is  an  exact  frame  for  H; 

b.  {</„}  is  a  bounded  unconditional  basis  of  H; 

c.  {^„}  is  a  Riesz  basis,  i.e.,  there  is  an  orthonormal  basis  {un}  and  a  topological 
isomorphism  T  :  H  —*  H  such  that  Tgn  =  for  each  n. 

2.6  Definition/Remark,  a.  Given  g  €  I'^(R)  and  sequences  {a„},{6m}  Q  R-  Define 

=  gi.i  —  O’n)  and  If  {Ei,^Ta„g}  is  a  frame  for  L^CR)  it  is  called 

a  weighted  Fourier  frame  jvith  weight  g. 

b.  Fourier  frames  {Eb„  }  were  defined  in  [DS]  for  L^[—T,  T\.  Gabor’s  seminal  paper  [G] 
deals  with  “regularly  latticed”  systems  where  g  is  the  Gaussian;  and  it  turns 

out  that  the  Heisenberg  group  is  fundamental  in  analyzing  the  structure  of  modulations 
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and  translations.  As  such,  the  names  “Gabor”  and  “Weyl- Heisenberg”  have  also  been 
associated  with  these  systems  in  the  case  of  regular  lattices. 

c.  {Eb„Ta„g}  is  a  frame  for  I^(R)  if  and  only  if  {Ta„(Eb„9)}  is  a  frame  for  i^(R). 
Also,  our  weighted  Fourier  frames  will  often  be  defined  for  L^(R).  As  such  we  note  that 

j)''  = 

2.7  Theorem.  Given  g  €  ^’(R)  and  a,fe  >  0.  Define 

Assume  that  there  exist  A,B>0  such  that 

(2.2)  0  <  A  <  G(t)  <  B  <  oo  a.t.  on  R, 

and  that  suppg  C  I  where  I  is  an  interval  of  length  1/fc.  Then  {EmkTnag}  is  a  frame  for 
X^(R),  with  frame  bounds  b~^A  and  b~^B,  and 


(2.3)  V  /  €  L^R),  S-'f  = 

2.8  Theorem.  Given  g  €  L^iR)  and  a,b>0.  Assume  {EnaTmbg}  is  a  frame  for  L^{R)- 
Then 

(2.4)  S~^{E„aTjnbg)  =  E„aTmbS~^  g. 

2.9  Example,  a.  Given  g  €  L^{R)  and  a, 5  >  0  for  which  ab  =  1.  If  {EmbTnag]  is  a 
frame  then  it  is  an  exact  frame.  This  remarkable  fact  (for  at  =  1)  can  be  proved  using 
properties  of  the  Zak  transform  which  we  now  define. 

b.  The  Zak  transform  of  /  €  I'^(R)  is 

Z/(x,u;)  =  ^/(la  + 

for  (i,u;)  €  R  X  R  and  a  >  0.  It  turns  out  that  the  Zak  transform  is  a  unitary  map  of 
I2(R)  onto  L^iQ),  Q  =  (0, 1)  x  [0, 1). 
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c.  If  {EmhTnag}  is  a  frame  for  ab  =  1,  it  is  a  boimded  unconditional  basis  (part  a  and 
Theorem  2.5);  and,  in  particular,  the  frame  decomposition 

V  /  €  I'CR),  f  = 


(Theorem  2.2a)  is  vmique.  We  shall  verify  that 


(2.5) 


Cm,n  —  EfniTnag) 


ZfjxM 

Zg(x,u) 


First,  with  the  hypotheses  that  {EmhTnag}  is  a  frame  for  I'^(R)  and  ab  =  1,  we  compute 


V  F  €  LHQ),  SzF  =  F|Zj|=, 


where  Sz  :  L^(Q)  ^^{Q)  is  the  frame  operator  for  the  frame  {Z(Emi>T„og)} .  Thus, 


(2.6) 


v/€I^(R),  5;’(z/)  =  |^, 


Next,  using  (2.6),  we  compute 


V/€L’(R),  Z/  =  SzSJ'(Z/) 

,Zf 


—  ^  A  2g^T'm,n)Em,nZg, 


where  Em,n{^,u))  =  e2’^''«*c2irinw  Consequently, 


V/€L^(R),  /  =  ^(|^,F„,„)Z-‘(F„,„Zp) 

~  ^^(  2“  ’  ^m,n)FinfcTnBP, 


SO  that  (2.5)  is  obtained  by  the  uniqueness  of  the  representation. 
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3.  Regular  sampling  and  weighted  Fourier  frames. 

The  theme  of  this  section  is  to  prove  classical  sampling  results  by  frame  methods  in  the 
case  that  the  inverse  frame  operator  S~^  is  a  multiplier. 

The  Paley-Wiener  space,  is  the  subset  of  I-^(R)  whose  elements  are  Q.- 

bandlimited,  i.e. 

PWn  =  {/  €  I^(R)  :  auppf  C  [-«,«]}. 

Clearly  the  elements  of  PWq  are  entire  functions. 

3.1  Theorem.  Given  T, >  0  for  which  0  <  T  <  ^.  Then 

(3.1)  V/€PW"n,  /  =  r^/(nr)r„rd2,ft  m  L^CR), 

where  d2r{i  is  the  2nQ.  dilation  of  the  Dirichlet  function 

.  sin  < 

‘'w  =  — ’ 

where  f(nT)  is  the  value  of  f  at  nT  €  R,  and  where  TnT^2itn  the  translation 

Proof:  Let  g  =  <^2irn  so  that  g  =  l(n)  and  ||^||2  =  1.  Set  a  =  T  and  h  =  29. 

so  that  ab  =  2TQ,  <  1.  Note  that 

^  1^(7  -  m6)|^  =  ^  a.e., 

suppg  C  [-n,f2],  and  |[-f2,fi]|  <  1/a.  Thus,  by  Theorem  2.7,  {EnaTmbg}  is  a  frame. 
Consequently,  by  Theorem  2.2a  and  Theorem  2.8, 

(3.2)  V/€L2(R),  f  =  T{f\E„,T^,S-^g)EnaT,nt9  in  L^R). 

Since  suppg  is  compact,  we  have 

Vft€l^(R),  5~^h  =  2rn/i 
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by  Theorem  2.7;  and,  hence,  (3.2)  becomes 


(3.3) 


v/€i*(R),  /  =  2ra^</,£,.T„,j)r.  nm  Emb9  in  I^(R). 


If  /  €  PWq  then 
(3.4)  (/,  Ena  PmhS) 


772 /(-^^)*  form=0 
0,  /or  m^O. 


The  sampling  formula,  (3.1),  follows  from  (3.3)  and  (3.4).  | 

The  hypothesis,  that  /  €  PWq,  was  essential  in  both  parts  of  (3.4);  and  the  above  proof 
shows  that  only  “t-information”  (i.e.,  m  =  0)  is  required  in  this  case.  When  f  is  not 
fl-bandlimited  so  that  aliasing  occxirs,  phase  information  contributed  by  m  ^  0  is  required 
in  the  frame  decomposition  of  a  signal.  To  quantify  this  remark,  we  define  the  aliasing 
pseudomeasure,  0!t,n,  on  R  as  the  distributional  Founer  transform,  nt.n  = 
each  t  is  fixed  and 

At.n  =  -  l)(r2„nl<n))  €  I‘*(R). 

3.2  Calculation/ Definition.  Let  /  e  L*(R)  and  assume  2TQ.  =  1.  Writing  (3.3)  as  a 
sum,  Em=o,n  +  Em^o.ni  Compute 

(3.5)  m  =  T'£K  nT)TnT<i2ifu(,i)  +  P  *  <^t,n){nT)  TnT  d2irn(0- 

The  aliasing  error  of  /  at  t  for  the  low  pass  filter  d27rn  is 

ae{f,t)  =  T *  Qt,n){nT)TnTd2rn{t)- 


Formally,  standard  calculations  give 

(3.6)  llae(/,.)|Ioo<2  /  |/(7)|d7. 

’'lTl>n 

< 

In  the  following  result  we  use  sampling  kernels  s  with  more  rapid  decay  than  d2nO-  The 
goal  is  better  computational  efficiency  for  low  pass  filters;  the  price  to  be  paiid  is  more 
scimpling. 


9 


30  AFIT/AFOSR  Wavelets  Workshop 


3.3  Theorem.  Given  >  0,  for  which  0<T<^,  and  g  €  wth  the  properties 
that  suppg  C  [-j^,  2y]>  ff  =  1  on  [— Q],  and  g  >  0  on  (jf,  — fi]  U  [ft,  ).  Set 

^(y)  =  sit)  = 

where  ft  +  jT  <  ^  <  7-  Then  0  <  A  <  G(y)  <  B  <  oo,  s  £  5(R),  supps  —  suppg ,  s  =  -^ 
on  [—ft,  ft],  and 

(3.7)  '^fePWa,  f=TY,f{”T)T„TS  inL^(R). 

Proof:  The  assertions  about  G  and  s  follow  from  our  choice  of  b. 

Set  a  =  T  so  that  [suppy  |  =  1/a.  Thus,  using  the  fact,  A  <  G(j)  <  B,  and  Theorem 
2.7,  we  see  that  {EnaTmhg}  is  a  frame.  Since  suppg  is  compact,  we  have 

V/.€i^(R),  S-'h  =  T^ 

by  Theorem  2.7;  and,  hence,  we  have  the  frame  decomposition 

(3.8)  V/€L’{R),  f  =  TY,{j,E..T„ig)E„,T„tS, 

m,n 

where  we  have  used  the  fact  that  S~^{EnaTmh9)  =  EnaTmbS~^ g  (Theorem  2.8). 

If  /  €  PW(i,  then  (3.4)  is  again  valid  since  g  =  1  on  [—ft,  ft).  The  sampling  formula 
(3.7)  follows  from  (3.4)  and  (3.8).  | 

3.4  Example,  a.  In  Theorem  3.1,  {Enormb^}  is  a  tight  frame  with  frame  bounds 
A  =  B  =  I  in  the  case  2rft  =  1,  where  a  —  T  and  h  =  2ft.  Clearly,  {EnaTmbg-,EgaTpbg)  is 
1  if  (m,  n)  =  (p,  9)  and  is  0  if  m  ^  p.  If  m  =  p  and  n  ^  q  then  this  inner  product  is 

g2iri(2TO)m(n— j) 

Thus,  {EnaTmbg}  is  an  orthonormal  sequence  if  and  only  if  2rft  =  1.  Consequently,  by 
Theorem  2.2b,  {E„aTmbg}  is  an  orthonormal  basis  if  and  only  if  2rft  =  1. 
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b.  Suppose  2TQ  <  1.  To  construct  g  €  ^(R)  satisfying  the  conditions  of  Theorem  3.3 
we  proceed  eis  follows,  cf.,  [H]  for  a  different  construction  dep>ending  on  the  Pythagorean 
theorem. 

We  begin  in  the  standard  “distributional  way”  by  defining 


Mk) 


-  l7l) 

/^{e-  l7l)<^7’ 


where  <f>  €  C°°(R)  vanishes  on  (— oo,0]  and  equals  on  [0,oo).  Thus,  is 

an  even  function  satisfying  the  conditions,  supp  rpf  =  [— c,c]  and  J  rptil)  d'y  =  1.  Next  set 


1 

V’l/.v  =  *  liz-v',  U,  VCR, 

so  that  tpu,v  is  1  on  and  vanishes  oS  oi  U  +  V  —  V,  The  function  g  will  be  defined  in 
terms  of  y  as  ^  =  tpu.v  *  V’«»  where  we  shall  now  specify  c,  U,  and  V  given  2TQ,  <  1.  Let 
U  =  [— u,tt],  where  u  €  (ft,  is  arbitrary,  and  let  e  =  u  —  ft.  Choose  V  =  [— t>,v]  by 
setting  V  =  -2^^,  where  w  =  ■^+e.  These  choices  are  necessitated  by  a  simple  geometrical 
2u-gument,  and  the  resulting  fimction  g  satisfies  the  desired  properties. 


4.  Weighted  Fourier  frames  for  irregular  lattices. 

In  the  case  of  irregular  lattices,  the  following  result  is  the  analogue  of  Theorem  2.7  for 
R. 

4.1  Theorem.  Given  ft  >  0  and  Jet  g  €  PWn-  Assume  that  {a„} ,  {fcm}  are  real  sequences 
for  which 

(4.1)  {Fa„}  is  a  frame  for  £^[— ft,ft], 

I 

and  that  there  exist  A,B>0  such  that 

(4.2)  0  <  >l  <  G(j)  <  B  <  oo  a.e.  on  R, 
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where 

Then  {Ea„Ti,^g]  is  a  frame  for  £^(R);  and  {Ea„Ti,^g)  is  a  tight  frame  for  X^(R)  if  and 
only  if  {Ea„ }  is  a  tight  frame  for  X^[— Q,Qj  and  G  is  a  constant  a.e.  on  R. 

Proof:  I  =  (— and  set  =  f  +  ^m*  For  fixed  m,  {Ti,^Ea„}  is  a  frame  for  LP'{lm) 
with  frame  bounds  Aj,  Bj  independent  of  m.  Thus,  for  all  /i  €  ^^(R)  for  which  supph  C 
Im ,  we  have 

(4.3)  A,  Ilfclli.,;.,  <  <  Bj 

n 

Take  any  /  €  L^(R).  Because  of  (4.2),  g  €  X°®(R);  and,  hence,  h^j  =  fTiJg  6  L?{Ijn)- 
Also,  since  g  is  ft-bandlimited,  hm,f  vanishes  off  of  7^-  Substituting  hm,/  into  (4.3)  and 
summing  over  m,  we  obtain 

m  m  n  m 

We  now  compute 

=  (/,T,.(j£:..)> 

and,  using  the  fact  that  g  is  fi-bandlimited, 

E  ll/r.  =  /  |/(7)I’(E  \9(i  -  K)?)dl. 

By  these  calculations,  as  well  as  (4.2)  and  (4.4),  we  obtain 

‘  m  n 

Thus,  {Ea„Ti,^g)  is  a  frame  for  7^(R).  The  characterization  of  {7^o„76„,^}  as  a  tight 
frame  follows  immediately  from  (4.5).  | 
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4.2  Corollary.  Given  the  hypotheses  of  Theorem  4.1  and  set  =  [—^,17]  +  bm. 

For  each  fixed  m,  {Ti^Ea„}  is  a  frame  for  ^ith  frame  operator  S^,  cf,  (4.3), 

{Ea„Ti,^g]  is  a  frame  for  i^(R)  with  frame  operator  S,  and 

Vh€L\R).  5A  =  X^7i„j5„(AT4.|). 

Proof:  We  compute 

m  n 

m  n 

m  n 

—  “EbmQ  Em{hTb^g).  I 
m 

If  “j”  is  ajiy  Borel  measurable  function  for  which  G(j)  <  B  a.e.  on  R,  then  g  €  jL°°(R). 
The  converse  is  a  part  of  the  following  result. 

4.3  Theorem.  Given  17  >  0.  Assume  that  {a„},  {6m}  are  real  sequences  for  which  {Ea„ } 
is  a  frame  for  L^[— 17,17],  and  that  there  exist  d,  D  >  0  juch  that 

(4.6)  V  m,  0  <  <f  <  6m+i  -  bm  <  D  <  211, 

where  limm_±oo  6m  =  ioo-  Suppose  g  €  PW^  has  the  properties  that  g  €  L°°(R)  and 
A  =  inf  {|g(A)p  :  A  €  /}  >  0  for  some  interval  I  C  [—12,17]  having  measure  |/|  =  D.  Then 
{Ea„Ti,„g}  is  a  frame  for  L^(R). 

Proof:  It  suffices  to  verify  condition  (4.2)  of  Theorem  4.1. 

For  each  7,  ^(7)  is  a  finite  sum;  and,  in  fact,  this  sum  has  at  most  [^]  +  1  terms.  Thus, 

V  7,  g(7)  <  (1^1  +  DllslU  =  S  <  00, 

and  the  upper  bound  is  obtained. 

For  each  7  €  R  there  is  a  6m  such  that  7  —  6m  €  /.  Thus, 

(^(7)  >  1^7  -  6m  )|^  >  X  >  0, 

and  the  lower  bound  is  obtained.  | 
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4.4  Remark,  a.  Consider  condition  (4.1),  used  in  both  Theorem  4.1  and  4.3. 

a.i.  A  sequence  {a„}  C  R  has  uniform  density  A  >  0  if  there  exist  constants  L  and 
d  such  that 

\an-^\<L 

and 

yn^m,  |a„  -  a,„|  >  d  >  0. 

Duffin  and  Schaeffer  [DS]  proved  that  if  {a„)  has  uniform  density  A  >  0  and  0  <  2Q  <  A 
then  »•*  a  frame  for  Z^[— fi, Q].  For  a  given  sequence  {a„}  C  R  let  IIr  be  the  least 

-pper  bound  of  all  Q.  for  which  is  a  frame  for  L^[— flu  is  the  frar^e  radius 

of  {on}-  Duffin  and  Schaeffer’s  theorem  czm  be  rephrased  and  refined  as  follows:  if  {on} 
has  uniform  density  A  >  0  then  Or  >  -y. 

Important  work  on  this  topic  is  due  to  (La;  J],  cf.,  [H].  We  mention  the  following 
fact  which  follows  from  [DS;  J].  Suppose  {Fa„}  is  an  exact  frame  for  L^[— n,fi].  Then 
{■San}  is  not  a  frame  for  for  any  Qi  >  Cl,  and  {Fo„}  is  an  inexact  frame  for 

L^[-Clx,Cl,]  for  every  0  <  fli  <  f).  In  this  latter  case  we  can  remove  any  finite  number  of 
arbitrarily  selected  elements  of  {a,,}  and  still  have  a  frame  for  Z.^[— flj,  flj]. 

a. ii.  If  a„  =  na  and  a  =  •^  then  {F„a}  is  an  orthonormal  basis  of  L^[— The 
sequence  {no}  has  uniform  density  A  = 

b. i.  Given  the  hypotheses  o/ Theorem  4.1  in  the  case  a„  =  na  and  a  =  jn-  Then 

(4.7)  V/€i^(R),  S-'f=±l 

To  verify  (4.7)  note  that  { is  an  orthonormal  basis  of  each  L^{Im)  and  that 
/  Tb„g  €  L^{Im).  Since  {E„aTb^g]  is  a  frame  for  L^(R)  we  have 

5/  =  EE 

«  m  n 

f-*  S)  =  E  (r»„9)(E 

tn  n 

=  2ClfG. 
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Using  S~^  f  instead  of  /  in  (4.8)  we  obtain  (4.7). 

b.ii.  In  Theorem  3.3  we  used  the  commutativity  of  the  operators  S~^  and  EnaTmb  iii 
proving  the  sampling  formula. 

Now  suppose  we  have  the  hypotheses  of  Theorem  4.1  in  the  case  a„  =  na  and  o.  = 
Then  by  part  b.i.  we  have  (4.7),  so  that 


S-\Enan„g) 


2SI  G 


On  the  other  hand, 


EnaT.S-^g  = 


I  EnqEl^g 

212  T,^G  ’ 


so  that  the  operators  S  ^  and  EnaTi,„  a.re  not  commutative  for  irregular  sequences  {fem}- 
4.5  Example.  Given  the  hypotheses  of  Theorem  4.1.  Then 


V  /  €  £’(R),  g))E^.T,„g  in  1^(R), 

and  so 


(4.9)  V/€I’(R),  /(<)  =  E 

where 

Cn(()  =  E  C!.^-'‘’''"^~S-\E.,%^g))EiJt). 

m 

With  various  further  hypotheses,  (4.9)  will  be  a  “sampling”  formula,  cf..  Theorem  6.2, 
The  point  we  make  now  is  that  the  frequencies  for  Fourier  frames  on  R.  provide  the  trans¬ 
lation  points  on  R  for  sampling  formulas. 


5.  Irregular  sampling  — sampled  coefficients  and  exact  frames. 

The  theory  of  non-harmonic  Fourier  series  was  developed  by  Paley  and  W’iener  [PW, 
Chapte  j‘6  and  7]  and  Levinson  [L,  Chapter  4].  Related  work  preceding  [PW]  is  due  to  G. 
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D.  BirkhofF  (1917),  J.L.  Walsh  (1921),  and  Wiener  (1927).  The  Paley-Wiener  and  Levinson 
theory  has  been  reformulated  and  analyzed  in  terms  of  irregular  sampling  by  Beutler  [Bel; 
Be2]  for  completeness  and  Yao  and  Thomas  [YT]  for  expansions.  The  Yao  and  Thomas 
expansion  was  discovered  independently  by  Higgins  [Hil]  using  reproducing  kernels;  there 
is  also  the  interesting  new  work  by  Rawn  [R].  In  this  section  we  shall  state  and  prove  this 
irregular  sampling  expansion  by  frame  methods.  The  coefficients  in  the  expansion  are  the 
values  of  the  given  signal  at  the  given  irreirularly  spaced  sampling  points,  cf.,  Section  6. 

Whereeis  we  implemented  as  a  multiplier  in  Section  3,  in  this  section  we  shall 
invoke  a  formula,  viz.,  (5.1),  related  to  the  fact  that  is  the  imique  biorthonormal 

sequence  associated  to  a  given  exact  frtime  {§„},  cf.,  Theorem  2.2c. 

5.1  Proposition.  Let  H  be  a,  separable  Hilbert  space  and  let  {fir„}  C  H  be  an  exact 
frame  with  inverse  frame  operator  S~^ .  Then 


(5.1)  v/€ir,  =  '"fi. 


where  {/i„}  is  the  unique  biorthonormal  sequence  associated  with  {fifn}-  fn  varticular, 
{‘S'~^yn}  =  {^n}>  and  so  S“^  is  the  frame  operator  of  the  dual  frame 

Proof:  Since  is  exact,  {y„}  and  axe  biorthonormal  (Theorem  2.2c); 

and  since  {jn}  is  complete,  we  see  that  is  the  tmique  biorthonormal  sequence 

associated  with  {^n}-  (5.1)  follows  immediately  from  Theorem  2.2a.  | 

5.2  Theorem.  Given  fl  >  0  and  {a„}  C  R,  let  t„  =  —a„,  and  assume  {iS'on}  is  an  exact 
frame  for  L^[— f2,n].  Define  s„(t)  in  terms  of  its  involution  s„(i)  =  where 


(5.2) 


Vt  €  R, 


h„(7)e2"'^d7, 


and  where  {h„}  is  the  unique  biorthonormal  sequence  associated  with  (In  partic¬ 

ular,  Sr,  €  PWfl.)  Then 


(5-3)  Vf€PWa.  /  =  tni^{R). 
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where  Sn(<)  =  5„(— <)  € 

Proof:  Let  g  =  d2nU  and  set  =  2Slm.  Note  that 

^(7)  =  1^(7  -  ^  a-e. 

and  supp g  C  [-ft,  fi].  Thus,  since  }  is  a  frame,  we  can  apply  Theorem  4.1  to  obtain 
that  {Ea„Tb„g}  is  a  frame  for  I^(R)  with  frame  operator  S.  In  particular, 

(5.4)  VA€£’{R),  i  =  ^(i.£..r..s)5->(£..Ti.j)  in  L^R)- 

Similarly  to  (3.4),  we  obtain 

^55^ /(-<.„),  ifm  =  0 
0,  1/  m  ^  0 

for  /  €  PWq. 

Let  5m  be  the  frame  operator  for  the  frame  {Tb^Ea„)  for  L’^{Im),  where  Im  =  [-f^,  + 

bm-  By  Corollary  4.2,  we  have 

(5.6)  Vh€L'(R),  5h  =  ^Tt„s5m(hrtJ)  in  ^^(r). 

m 

From  (5.6)  and  the  definition  of  g  we  compute 

Sf=-^'Z,  *«»(''  -  S’-(H-)  !(«)(■  -  2nm))(7) 

m 

=  ^5o/(7) 

for  /  €  PWq,  where  the  second  equedity  follows  since  supp  f  C  [— n,fl].  Thus, 

(5.7)  .V  /  €  PWtt,  Sj  =  i  So/, 

i.e.,  the  action  of  5  on  L^[— fl,n]  can  be  realized  by  the  action  of  ^  So  on  that  same 
subspace  of  Z«^(R). 
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Using  (5.7),  we  can  write 

vfePWo,  f  =  S-'Sf 

so  that,  if  we  replace  /  by  we  obtain 

S-^=2nSo' 

Therefore,  since  g  €  PW^t 

S-\E.,T,,g)  =  S-\E.J) 

=  2ns;\E.j) 

=  (20)*/^ 

m 

=  (2S})''^h., 

where  the  penultimate  equality  depends  on  the  exactness  hypothesis  and  Proposition 
5.1.  Substituting  this  information  into  (5.4)  and  (5.5)  gives  the  reconstruction, 

V/€PW„,  /  =  ^^_^/(-a„)(2n)>/X  mi*(R), 

which,  in  turn,  yields  (5.3).  | 

5.3  Remark.  Levinson  [L,  Theorem  18]  proved  that  if  >  0  and  {a„}  C  R  satisfy 

(5.8)  sup  In  —  2f2a„|  < -, 

fi  4 

then  {Pa„}  is  complete  in  L^[— n,fl]  and  has  a  imique  biorthonormal  sequence  {h„}. 
Kadec  (1964)  [Ka]  provided  the  direct  calculation  proving  that  is  a  Riesz  basis,  i.e., 

exact  frame,  if  (5.8)  holds,  cf.,  [Y,  pp.  34-36]  for  a  characterization  to  ensvure  that  complete 
sets  with  zissociated  biorthonormed  sequences  are  Riesz  bases. 

The  bound  “1/4”  in  (5.8)  is  best  possible  (L,  Theorem  19]. 

The  explicit  formulas  in  the  following  result  are  proved  in  [PW,  pp.  89-90  and  pp. 
114-116]  and  [L,  pp.  i  .ff].  The  calculations  by  Paley  and  Wiener  were  refined  by  Young 
(1979),  e.g.,  [Y,  pp.  148-150].  The  remainder  of  the  proof  is  referenced  in  Remark  5.3. 
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5.4  Theorem.  Given  n  >  0  and  {a„}  C  R,  and  assume  (5.8).  Then  {J?o„}  is  an  exact 
frame  for  with  unique  biorthonormal  sequence  {/in};  snd  defined  by  (5.2), 

is 


(5.9) 


5n(0 


^(0 

5'(an)(<  -a„) 


where 

(5.10)  5(0  =  (t  -  ao)  n(l  -  ;^)(1  -  -^)- 

On  a-n 

n=l 

5.5  Example.  Note  that  the  sampling  functions  s„,  defined  in  Theorem  5.2,  are  given 
by 

where 

«=1 

and  they  have  the  property  that 


(5.11)  V  fTI,  n  Sn(tjji')  —  (/ln>.^am)n  —  ^mn- 

This  property  of  sampling  fimctions  is  usually  described  by  saying  that  {sn}  is  a  sequence 
of  Lagrangia  interpolating  functions. 


6.  Irregular  sampling  — irregular  translates  of  a  sampling  function. 

Our  basic  result  in  this  section,  Theorem  6.2,  is  dual  to  the  sampling  theorem.  Theo¬ 
rem  5.2,  in  the  following  way.  Exact  frames  were  reqmred  in  Section  5  and  the  sampled 
values  of  the  signal  were  explicit  in  the  dual  frame  expansion.  Theorem  6.2  will  use 
general  frames,  and  the  frame,  expansion  will  only  reqmre  the  irregiilar  translates  of  a  sin¬ 
gle  sampling  function.  The  dual  frame  expansion  wzis  global  in  Section  5  and  the  frame 
expansion  is  local  in  this  section. 

The  following  fact  is  clear. 
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C.l  Lemma.  Given  /  /„  t  I*(R),  aud  fussuiiiC  /  =  iij  I'‘^(R).  If  ft  t  ihfU 

ffi  =  Ylfv9 

G.2  Theorem.  Giveu  >  0  mid  {a„)  c  R,  let  t„  =  -a„,  mid  ussuinv  <•-  finnn  fo: 

^  for  some  fii  >  fl,  with  frame  operator  S.  Let  fi  €  «5>{R)  have  the  jjioj/ei  iic' 
that  siipjig  C  [— Q],Qj]  and  g  =  1  on  [-Q,^].  Then 

(C.l)  VfePU'ii,  f  =  e  I‘(R). 

where  s  =  g.  ("We  choose  “s”  sj'jjcc  it  represents  the  “smnpling”  function.) 

Proot;  Since  {£'„„}  is  a  frame  for  L^[  — and  sup])  f  C  [  — we  have* 

/  =  /l(n.) 

(C.2)  =i;;(^‘‘dl(n.))..E„.)|.n,.„,,  ...  i’(R). 

In  this  expression,  we  note  that  5~',  being  positive,  is  self-adjoint  so  that  the  Iranie 
expansion  in  Theorem  2.2a  gives  rise  to  (G.2).  Also,  the  L^[— Qj,Qi]  convergence  from 
our  frame  hypothesis  can  be  taken  to  be  in  1.^(6.)  by  extending  all  functions  to  be  '/ero 
outside  [— fli ,  Hi]. 

We  have  /  =  /y  on  R  since  ^  =  1  on  [— Q,Q]  and  /  =  0  off  of  [— f2,  fi].  .A.lso. 

=  ^(5->(/l(n.)).  £:„„)i_n,.n.]  E„„\n,^g  n,  L^R) 

by  Lemma  6.1.  Thus,  since  supp  g  C  [— flj.Qi],  we  obtain 
f  =  f<J 

=  J^{S-’(/l(n,)),  Ea„)\-n,p.]EaJj  I‘(R). 


Taking  the  inverse  transform  gives  (C.l).  | 

The  following  result  allows  us  to  be  more  explicit  about  the  coefficients  in  (G.l)  in  the 
case  of  exact  frames  and  the  Levinson  (and  Kadec)  condition  (5.8). 
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6.3  Theorem.  Given  Q>  0  and  {a„}  C  R,  let  <„  =  — a„,  and  assume 

sup  In  —  2^li  a„|  <  ^ 
n  4 

for  some  Qj  >  Q.  Then  {£?o„}  is  an  exact  frame  for  L*[— with  frame  operator  S 
and  unique  biorthonormaJ  sequence  {/»«}•  Further,  if  we  define  S„  and  s  on  [—^1,0]]  by 
(5.9)  and  (5.10),  then  (s„)*  =  hn  (where  hn  =  0  off  of  [— and  the  coefficients  of 
(6.1)  are 

V  n,  (S-‘(/l(n,)).  =  UW.s.it)), 

where  fePWn. 

Proof:  The  exact  frame  and  biorthonormal  sequence  conclusions  follow  from  Theorem 
5.4,  as  well  as  (5.9),  (5.10),  and  the  relation  (^n)"  =  b„.  Letting  H  =  in 

Proposition  5.1,  we  have 

V  F  €  S-'(F)  =  £(F,  A™)|-n..<,,i 

Consequently,  by  orthonormality, 

V  n,  S  t-^o„)  = 

~  (^a„  1  ^n)[-ni  ,n,) 

=  K. 

Therefore,  setting  /i„  =  0  off  of  [— ni,f2i]  and  noting  that  Sn  is  real-valued,  we  compute 
(5“^(/l(nj)),  £^-i„)(_n,,n,]  =  (/, 5~^(f;o„))[-n,,n,] 

=  (/,  ^n)[-n,,ni) 

=  (/i  ^n)-R_ 

=  (/,  ^'n)R 

=  (/(<),  siFO) 

=  {m.S„{t)), 

for  each  /  €  PWq.  | 
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6.4  Algorithm.  It  is  f)ossible  to  estimate  the  coefficients  in  (6.1)  without  dealing  with 
exact  frames.  In  so  doing,  we  shall  see  to  what  extent  these  coefficients  contain  information 
from  the  s^Lmpled  values  f{tn)- 

Let  {.Eo„}  he  a  frame  for  with  frame  operator  S  and  frame  bounds  A  and 

B.  Since 


(6.3) 


we  have 


11/ - 


2 

A  +  B 


^11  < 


B-A 
A  +  B 


<1, 


(6.4) 


S-*  = 


A  +  B 


*=o 


A  +  B 


where  I :  I,^[— Qj,  flj]  —*  £^[— Hi,  fti]  is  the  identity  map,  the  norm  in  (6.3)  is  the  operator 
norm,  and  the  convergence  in  (6.4)  is  the  operator  norm  topology  on  the  space  of  continuous 
linear  operators  on  £^[— Qi,f2]]  (into  itself). 

Setting  tn  =  —On  and  using  (6.4),  the  coefficients  in  (6.1)  become 

Cn  =  (5"^(/l(n,)). 


(6.5) 


A  +  B 


t=o 


for  /  €  PWii,  fl  <  fli.  If  we  truncate  the  expansion  (6.5)  after  the  k  =  0  term,  then 


Cfi  — 


A  +  B 


fitn). 


Notation. 

The  Fourier  transform  /  of  /  €  defined  as  /(t)  =  J  dt,  where 

designates  integration  over  the  real  line  R;  /  is  defined  on  R  (=  R)  and  /  is  the  in\erse 
Fourier  transform  of  /.  The  Fourier  transform  is  defined  on  £^(R),  and,  for  fixed  fl  >  0, 

-PW'n  =  {/ €  £^(R) :  supp/  C 
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where  suppf  is  the  support  of  /.  A  function  (or  distribution)  f,  whose  Fourier  transform 
exists,  is  fi-bandlimited  if  suppf  C  [-fi,  Q]. 

Besides  the  L^(R)-spaces  and  the  Schwartz  space  «S(R),  we  deal  with  the  space  C°°(R) 
of  infinitely  differentiable  functions  and  its  subspace  C“(R)  whose  elements  have  compact 
support. 

designates  summation  over  the  whole  discrete  group  in  question,  e.g.,  over  Z  x  Z 
where  Z  is  the  group  of  integers.  The  function  I5  is  the  characteristic  function  of  5  C  R, 
|5|  is  the  Lebesgue  measure  of  5,  and  1(q)  =  l{_n,n).  The  function  Smn  is  defined  as 
0  if  m  7^  n  and  as  1  if  m  =  n.  The  dilation  fx  of  the  fimction  /  is  fx{t)  =  A/(At), 
and  the  translation  Tt^f  is  =  f{t  —  to)-  Finally,  the  exponential  function  £„  is 

Ea{t)  = 
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ABSTRACT 

We  propose  a  shift-invariant  multiresolutioii  representation 
of  signals  or  images  using  dilations  and  translations  of  the 
auto-correlation  functions  of  compactly  supported  wavelets. 
Although  this  set  of  functions  does  not  form  an  orthonormal 
basis,  a  number  of  properties  of  the  auto-correlation  func¬ 
tions  of  the  compactly  supported  wavelets  makes  them  use¬ 
ful  for  signal  and  image  analysis.  Unlike  wavelet-based  or¬ 
thonormal  representations,  our  representation  has  ( 1 )  sym¬ 
metric  analyzing  functions,  (2)  shifi-invariance,  (3)  natural 
and  simple  iterative  interpolation  schemes,  (4)  a  simple  al¬ 
gorithm  for  finding  the  locations  of  the  muUiscale  edges  as 
zero-crossings. 

We  als<  develop  a  non-iterative  method  for  reconstructing 
signals  from  their  zero-crossings  (and  slopes  at  these  zero- 
crossings)  in  our  representation.  This  method  reduces  the 
problem  to  that  of  solving  a  system  of  linear  equations. 


1.  INTRODUCTION 

The  information  about  the  local  behavior  of  a  function  is 
hidden  in  the  decay  (or  growth)  from  scale  to  scale  of  the  co¬ 
efficients  of  the  orthonormal  wavelet  expansions.  Exploiting 
this  property  in  applications,  however,  is  not  a  straightfor¬ 
ward  exercise  in  part  due  to  the  fact  that  the  coefficients  of 
the  orthonormal  wavelet  expansions  are  not  shift  invariant. 
In  implementing  multiresoiution  algorithms  for  image  pro¬ 
cessing  and  signal  analysis,  redundant  representations  are 
being  used  in  order  to  simplify  the  analysis  of  coefficients 
from  scale  to  scale. 

Another  difficulty  in  utilizing  the  orthonormal  wavelets 
for  the  analysis  of  signals  in  image  processing  is  associated 
with  the  asymmetric  shape  of  compactly  supported  wavelets 
|4j.  On  the  one  hsmd,  using  compactly  supported  wavelets 
implies  that  the  associated  exact  quadrature  mirror  filters 
are  of  finite  size  which  is  advantageous  in  computer  im¬ 
plementations.  On  the  other  hand,  the  symmetric  basis 
functions  are  preferred  in  image  processing  since  their  use 
simplifies  finding  zero-crossings  (or  extrema)  corresponding 
to  the  locations  of  edges  in  images  at  later  stages  of  pro¬ 
cessing.  There  are  two  approaches  for  dealing  with  this 
problem.  The  first  approach  consists  in  constructing  ap¬ 
proximately  symmetric  orthonormal  wavelets  and  gives  rise 
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to  approximate  quadrature  mirror  filters  |5].  The  second 
consists  in  using  biorthogonal  bases  [3]  so  that  the  basis 
functions  may  be  chosen  to  be  exactly  symmetric. 

In  this  paper,  we  propose  a  “hybrid"  multiresolution  rep¬ 
resentation  which  utilizes  dilations  and  translations  of  the 
auto-correlation  functions  of  compactly  supported  w’avelets. 
In  this  representation,  the  exact  filters  for  the  decomposi¬ 
tion  (similar  to  the  quadrature  mirror  filters)  are  symmet¬ 
ric.  The  auto-correlation  functions  of  the  compactly'  sup¬ 
ported  wavelets  may  be  viewed  as  pseudo-differential  op¬ 
erators  of  even  order  and  behave,  essentially,  as  derivative 
operators  of  the  same  order.  This  allows  us  to  relate  the 
zero-crossings  in  this  representation  to  the  locations  of  edges 
at  different  scales  in  the  signal.  The  recursive  definition  of 
compactly  supported  wavelets  and,  therefore,  of  their  auto¬ 
correlation  functions,  allows  us  to  construct  fast  recursive 
algorithms  to  generate  the  multiresolution  representations. 
Though  it  is  not  an  orthogonal  representation,  there  is  a 
simple  relation  with  the  wavelet-based  orthogonal  represen¬ 
tations  on  each  scale.  We  describe  a  simple  reconstruction 
algorithm  to  recover  functions  from  such  expansions. 

2.  AN  ORTHONORMAL  SHELL 

In  this  section,  we  introduce  a  shift-invariant  representation 
using  orthonormal  wavelets.  We  refer  to  the  set  of  functions 

(t&j.tCi)}  ^  ^  and  {i?n„,s(x))o<*<^_, 

t  }  l<;<"o.  0<k<N-l  -  - 

as  a  shell  of  the  orthonormed  wavelets  (an  orthonormal  shell 
for  short),  where 

?„t(x)  =  2-'’/V(2-^(x  -  k)), 

ifc,.t(x)  =  2-'’'V(2-^(i-I:)), 

<p{x)  and  tl’(z)  are  the  scaling  fimetion  and  the  basic 
wavelet.  Note  that  for  each  j,  and  {i/'j.ir)  are  7’ 

times  more  redundant  than  and  Let  Vo  be 

the  vector  space  representing  the  finest  scale  of  interest. 
The  orthonormal  shell  coefficients  of  a  function  /  €  Vo, 
/  =  X,k=0  ^k'Po.k,  are 

o<t<Af-i  and  {•S|°}o<ic<A’-i, 
where  the  coefficients  Sj  and  d\  are  defined  as 

si  =  y  /(x)i?;,t(l)dl, 
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Essentially,  we  do  not  subsample  at  each  scale.  We  note 
that  the  computational  diagram  of  this  algorithm  is  es¬ 
sentially  identical  to  the  Hierarchical  Discrete  Correlation 
scheme  (HDC)  (2]  of  P.  Burt,  which  was  designed  for  effi¬ 
cient  correlation  of  images  at  multiple  scales.  This  repre¬ 
sentation  is  redundant  but  contains  all  orthonormal  wavelet 
coefficients  of  all  circular  shifts  of  the  original  signal  |H,  and 
its  computational  cost  is  0{N  log,  N).  However,  due  to  the 
asymmetric  shape  of  the  compactly  supported  orthonormal 
wavelets,  this  representation  might  still  be  inconvenient  for 
signal  analysis  purposes. 

3.  AN  auto-correlation  SHELL 

Instead  of  the  compa,:tly  supported  wavelets  we  use  their 
auto-correlation  functions,  i.e. 

A+OO 

♦('■)=  /  v’(y)‘^(v  - 


'i'(z)  =  /  th(.y)^(y  -  x)dy, 

J  _00 

to  overcome  a  number  of  difficulties  associated  with  the 
representations  in  the  orthonormal  shell. 

3.1.  Properties  of  the  Auto-Correlation  Functions 
Let  us  summarize  some  useful  properties  of  the  auto¬ 
correlation  functions  of  compactly  supported  wavelets.  Or¬ 
thogonality  of  the  wavelet  bases  implies  that 

<b(k)  =  6ok  and  =  Sok, 


or  equivalently. 


^(0  +  ♦({)  =  «(f/2). 


’I'(i)  =  2*(2i)  - 


(Compare  this,  e.g.,  with  the  approximation  of  the  Mexican 
Hat  function  by  the  difference  of  two  Gaussians  functions). 
It  is  easy  to  derive  the  following  two-scale  difference  equa¬ 
tions  for  the  functions  ^(x)  and 

<l.(i)  =  «I>{2i)-l--  a|„_,|<I>(2x-(-2/-l), 

Ln 

'I-(z)  =  *{2x)--  Y  a|„_,|4>(2T-l-2/-l), 

where  {at )  are  the  auto-correlation  coefficients  of  the 
quadrature  mirror  filter  H  =  {ht}o<t<i,-i, 

L-l-t 

at  =  2  ^  "  h|h|^^  for  1  =  1, ....  Z.  —  1, 


ajt  =0  for  !•  =  1, . . . ,  L/2  —  1. 


The  coefficients  {a2t-i  }i<i<i,/2  were  used  in  (1]  for  com¬ 
puting  representations  of  derivatives  and  convolution  oper¬ 
ators  in  the  bases  of  compactly  supported  wavelets.  They 
also  have  compact  supports  and  vanishing  moments: 

supp  4>(z)  =  supp  4'(z)  =  [—L  ■+  1,L  —  1]. 

/  +  0O 

x"''if{x)dT  =  0,  for  0  <  ni  <  L, 

'OO 

/+00 

i"'4>(z)dz  =  0,  for  1  <  m  <  L, 

OD 

/+OC. 

<t(x)dx  =  1. 

■OD 


I  <t{x)dx  =  1. 

J  —  OO 

In  addition  to  these  properties,  we  have 

•  a  complete  symmetry  of  the  functions  4>(z)  and  4'(i). 

•  *(0  which  means  that  the  operator  of  con¬ 

volution  with  'I'(x)  behaves  essentially  as  a  differentitil 
operator  (d/dxy. 

•  an  iterative  interpolation  scheme  induced  by  the  func¬ 
tion  ^>(z).  By  choosing  appropriate  wavelets,  the  cor¬ 
responding  auto-correlation  function  4*(z)  produce  a 
u'hole  range  of  interpolation  schemes  starting  from  the 
linear  interpolation  and  up  to  the  band-limited  inter¬ 
polation  (generated  by  the  sine  function)  [8]. 

3.2.  An  Auto-Correlation  Shell 

Now  we  define  a  multiresolution  representation  using  the 
auto-correlation  functions  of  wavelets.  We  refer  to  the  set 
of  functions 


t  i I<1< 


I<2<"0.  0<t<N-I 


{*"'’*(*)}o 


as  a  shell  of  the  auto-correlation  functions  of  orthonormal 
wavelets  (an  auto -correlation  shell  for  short). 

5,,t(z)  =  2-^'’«>(2--’(z-l:)), 

$,.t(z)  =  2--’'"'I'(2-'(z-L-)). 

The  relation  to  the  orthonormal  shell  is  derived  as  follows. 
First,  we  define  functions  based  on  orthonormal  shell  coef¬ 
ficients. 

N-l 

k=0 

N-l 

ft:  =  0 

Then,  convolving  fi(x)  and  /j(i)  with  x)  and 

2“■'^i;(2~■'z)  respectively,  we  obtain 

-4i/(x)  =  J  fiiy)  2“V(2"^(y  -  x))dy. 

A’J{x)  =  j  f,{y)  2-Ui2-’(y-x))dy. 
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2-*''*  for  fc  =  0, 

Pk=<,  2-*-'’a|*|  forJt  =  ±l,±3 . ±(I-1), 

0  for  Jt  =  ±2,±4,...,±(L-2), 


and 


/  2-'/‘ 
9*  =  i 

I  -Pk 


forfc  =  0, 
otherwise. 


Finally,  we  set 

N-l 

kssO 

N-l 

-4i/(*)  =  H  -  fc)- 

kmO 

The  coefficients  and  are  defined  as  “samples”  of 
Aif(x)  and  A^f(x),  i.e., 

Si^Aifik)  and  Di=Aif{k). 

The  auto-correlation  shell  coefficients  of  a  function  /  €  Vo, 

t  —  1  0 

/  =  Z-t-o  "e 

{f7i}i<j<»o.  o<t<A/-i  and  {5*“ }o<t<A/-i. 

As  easily  seen,  {4>j,s}  and  {'I'j.s}  not  orthonormal 
bases  of  and  Wj  anymore.  However,  the  representation 
of  functions  in  the  auto-correlation  shell,  has  the  following 
features: 

•  it  is  shift-invariant  and  contains  the  coefficients  of  all 
circular  shifts  of  the  original  signal, 

•  it  is  convertible  to  the  orthonormal  shell  representa¬ 
tion, 

•  it  is  completely  sjunmetric, 

•  zero-crossings  of  the  auto-correlation  shell  coefficients 
correspond  to  the  multiscale  edges, 

•  the  computational  cost  is  0(JV  log,  iV). 

We  also  point  out  an  important  relation  between  the  orig¬ 
inal  coefficients  {s^}  and  the  auto-correlation  shell  coeffi¬ 
cients. 

Proposition  1 

n-\  N-l 

L  =  E 

ibssO  JbsO 

N-l  N-l 

E  =  E 

k=0  k=0 

See  [8]  for  the  proof. 

3.3.  Fast  Decomposition  and  Reconstruction  Al¬ 
gorithms 

Rewriting  the  two-scale  difference  equations  for  the  auto¬ 
correlation  functions,  we  have 

■j=^{x/2)=  ^  pt^(x-k), 

*=-t+i 

-~<i(xl2)=  Y,  qk'ilx-k), 

k=-L-kl 


We  view  these  coefficients  as  filters  P  = 
and  Q  =  which  are  symmetric  and  have 

only  L/2  -f  1  distinct  non- zero  coefficients.  Although  this 
pair  of  filters  is  not  a  quadrature  mirror  filter  pair,  their 
role  and  use  in  the  numerical  algorithms  is  similar.  For 
example,  for  the  Haar  functions,  we  have 

For  the  Daubechies's  wavelet  with  L  =  2M  =  4,  we  have 


•S’*  =  Yt  P‘^k-kli-^l> 

^k-  E  ^'^k+js-ir 

As  for  the  reconstruction,  we  immediately  obtain  a  simple 
formula, 

sr’  =  ^(si  +  Di), 

for  j  =  l,...,no,  k  =  0, ...,JV  —  1.  Given  the  auto¬ 
correlation  shell  coefficients  {f7*)i<><no.  o<*<\'-i  and 

"0 

St  =  2~"'’^^ St"  ■+■  y  ' 

j=i 


4.  RECONSTRUCTION  FROM 
ZERO-CROSSINGS 

Since  the  operator  of  convolution  with  ^{x)  behaves,  es¬ 
sentially,  as  a  derivative  operator  of  the  even  order,  zero- 
crossings  in  our  representation  are  related  to  the  multiscale 
edges  of  the  original  signal.  We  also  have  an  efficient  itera¬ 
tive  algorithm  to  “zoom  in”  at  these  zero-crossings  (Dubuc’s 
symmetric  iterative  interpolation  (6), [7], (8]).  In  this  section, 
we  briefly  describe  our  reconstruction  algorithm  from  zero- 
crossings  (and  slopes  at  these  zero-crossings). 


and  for  the  “sine”  functions,  we  have 

{p*}  =  2-'/*  {sinc(I:/2))+: 
Using  these  filters  P  and  Q,  we  compute 


for  I:  =  0, . . . ,  —  1. 


3 


47b  At  I  r/AFOSR  Wavelets  Workshop 


4.1.  Computation  of  Zero-Croasings  and  Slopea 

Using  the  symmetric  iterative  interpolation  scheme  men¬ 
tioned  above,  we  compute  the  zero-crossing  locations  of  the 
set  of  functions  ~  ^)}i<j<»o  within  the  pre¬ 

scribed  numerical  accuracy,  e.g.,  e  =  10  To  compute  the 
locations  of  zero-crossings,  we  recursively  subdivide  the  unit 
interval  bracketing  the  zero-crossing  until  the  length  of  the 
subdivided  interval  bracketing  that  zero-crossing  becomes 
less  than  ihe  accuracy  e.  The  iterative  interpolation  scheme 
allows  us  to  zoom  in  as  much  as  we  want  a'-ound  the  zero¬ 
crossing.  This  process  requires  at  most  0(— Llogj  e)  opera¬ 
tions  per  zero-crossing.  Once  the  zero-crossing  is  found,  the 
computation  of  the  slope  requires  values  at  oidy  2(L  —  2) 
points  around  the  zero-crossing  [8]. 

4.2.  The  Problem  of  Reconstruction 

We  address  the  following  problem-  Given  ihe  coars¬ 
est  subsampled  coej^'ictents  •  “nd  the 

zero- crossings  and  the  slopes  at  these  zero- crossings 

where  N’.  is  the  number  of 

zero-crossings  of  the  function  —  k),  recon¬ 

struct  the  original  vector  {s®}o<*<n-i. 

Proposition  1  provides  a  simple  mechanism  for  defining  a 
linear  s\  stem  which  relates  the  unkn  --n  signal  { }  and 
the  values  of  the  function  4>(z)  and  i...  derivative  at  the 
integer  translates  of  zero-crossings. 

It  follows  from  Proposition  1,  that  the  zero-crossing  co¬ 
ordinate  satisfies 

^s?$,,s(ii„)=0, 

IkaO 

N-I 

k^O 

We  also  have 

k=0 

for  I  =  0, 1, . . . ,  A',  —  1,  where  N,  =  2"“"°.  Using  these 
equations,  we  set  up  a  system  of  linear  algebraic  equations 
for  the  unknown  vector  {sj}, 

A  8  =  V. 

where  8  €  is  a  shorthand  notation  of  the  original  sig¬ 
nal  (s®),  and  V  €  R^^*  is  a  data  vector  including  the 
available  coefficients.  Matrix  A  is  a  {2N,  -t  N,)  x  N  matrix 
and  has  the  following  structure; 


where  A^  is  a  2Nl  x  N  submatrix  whose  entries  are 

{A’h,,,  = 


(AO,*+,,,  =  2-^$;,,{xi), 

for  k  =  0, Nl  -  1  and  1  =  0, . . . ,  A^  —  1  and  S"°  is  a 
A',  X  N  submatrix  where 

for  k  =  0,. . . ,  N,  and  /  =  0,...,A’  —  1. 

Finally,  the  constraints  such  as  the  neiximum  distance  be¬ 
tween  zero-crossings  should  be  imposed  for  some  “sparse” 
original  signals  (e.g.,  an  impulse,  a  boxcar).  This  con¬ 
straints  may  be  expressed  as 

B  8  =  0, 

where  B  €  + 

The  problem  may  now  be  stated  as  follows: 

Minimize  ||A  8  —  v||  subject  to  B  8  =  0. 

We  obtain  the  least  square  solution 

8  =  (A^A  +  AB^B)"‘A^v. 

We  note  that  our  formulation  is  completely  linear  except 
for  the  process  of  the  zero-crossing  detection.  See  [8]  for 
the  details  and  examples. 
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ABSTRACT 


We  build  orthonormal  and  biorthogonal  wavelet  bases  of  with  dilation  matrices 

of  determinant  2.  As  for  the  one  dimensional  case,  our  construction  uses  a  scaling  function 
which  solves  a  two-scale  difference  equation  associated  to  a  FIR  filter.  Our  wavelets  are 
generated  from  a  single  compactly  .supported  mother  function.  However,  the  regularity  of 
these  functions  cannot  be  derived  by  the  same  approach  as  in  the  one  dimensional  case. 
We  review  existing  techniques  to  evaluate  the  regularity  of  wavelets,  and  we  introduce  new 
methods  which  allow  to  estimate  the  smoothness  of  non-separable  wavelets  and  scaling 
functions  in  the  most  general  situations.  We  illustrate  these  with  several  examples. 
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I  Introduction 

In  the  most  general  sense,  wavelet  bases  are  discrete  families  of  functions  obtained  by 
dilations  and  translations  of  a  finite  number  of  well  chosen  mother  functions.  The  most 
well  known  are  certainly  dyadic  orthonormal  bases  of  I^(]E),  of  the  type 

(1.1)  rliix)  =  -  it)  jez,  kez. 

These  constructions  have  found  many  interesting  applications,  both  in  mathematics  because 
they  form  Riesz  bases  for  many  functional  spares  and  in  signal  processing  because  wavelet 
expansions  are  more  appropriate  than  Fourier  series  to  represent  the  abrupt  changes  in 
non-stationary  signals. 

Several  examples  have  been  given  by  Meyer  (Mel],  Lemarie  [Le]  and  Daubechies  [Daul], 
generalizing  the  classic  Haar  basis  in  which  the  mother  wavelet  t/’  =  X'|o.i/2]  ~  XI1/2  1] 
suffers  from  a  lack  of  regularity  since  it  is  not  even  continuous.  All  are  based  on  the  concept 
of  mulliscale  analysis,  i.e.  a  ladder  of  closed  subspace  which  approximates  T^(E.), 

(1-2)  {0}-^...V'i  C  VoC  , 

(note  that  in  some  papers  and  in  the  Meyer’s  book,  the  converse  convention  is  used,  i.e. 

Vj  C  Vj+i)  and  satisfies  the  following  properties, 

(1-3)  fix)  e  V,^fi2x)  €  V,_,^/(2>i)  €  To  , 

(1.4)  there  exists  a  function  tp{x)  in  Vq  such  that  the 
set  {^{x  -  k)]k^%  is  an  orthonormal  basis  for  Vo  . 

Since  Vo  C  Vi,  the  scaling  function  <f{x)  has  to  be  the  solution  of  a  two  scale  difference 
equation, 

(1.5)  Vj(i)  =  2^  Cnip{2x-n). 

n€Z 

The  associated  wavelet  is  then  derived  from  the  scaling  function  by  the  formula 
(1-6)  Mx)  =  2  X^(-l)’‘  c,_„  <^(21  -  n)  . 

In  the  standard  interpretation  of  a  multiresolution  analysis,  the  projections  of  a  function 
/  on  the  spaces  Vj  are  viewed  as  successive  approximations  to  /,  with  finer  and  finer 
resolution  as  j  decreases.  The  wavelets  can  then  be  used  to  express  the  additional  details 
needed  to  go  from  one  resolution  to  the  next  finer  level,  since  the  {rk{x  -  k)}k^z  constitute 
an  orthonormal  basis  for  H''o,  the  orthogonal  complement  of  Vq  in  V'_).  The  w'hole  set 
{V'*(x)}j./cez  forms  then  an  orthonormal  basis  of  i^(E). 

We  are  here  interested  in  similar  constructions  adapted  to  functions  or  signals  of  more 
than  one  variable. 

The  most  commonly  used  method  to  build  a  multiresolution  analysis  and  wavelet  bases 
in  L^(E")  is  the  tensor  product  of  a  multiresolution  analyses  of  L^(P.).  In  Z,^(R^)  it  leads 
to  a  ladder  of  spaces  Vj  =  Vj  0  Vj  c  V^-i  generated  by  the  families, 

(1-")  ^i({x,y)  =  2-^ip{2-^x-k)ip[2-^y-C).  k.l£Z. 
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Three  wavelets  are  then  necessary  to  construct  the  orthogonal  complement  of  V’o  in  VLi, 


namely, 

(1.8) 

^a[x,y)  =  <p{x)ik{y) 

(1.9) 

^hix,y]  =  rJ’{x)^(y) 

(1.10) 

'i'c(i,y)  =  dix)diy) 

Actually,  the  theory  of  multiresolution  analysis,  as  it  was  introduced  by  S.  Mallat  and 
Meyer  (see  [Mai]  and  [Mel])  was  first  motivated  by  the  possibility  of  building  these 
separable  wavelets  for  the  analysis  of  digital  picture. 

It  is  clear,  however,  that  this  choice  is  restrictive  and  that  it  gives  a  particular  importance 
to  the  X  and  y  directions,  since  5’a  and  4'b  match  respectively  the  horizontal  and  vertical 
details. 

A  more  genera]  way  of  extending  mulliresolution  analysis  to  v  dimensions  consist  in 
replacing  the  axioma  (1.3)  and  (1.4)  by 

(1.11)  /(:r)€Vj^/(Z)i)eV,-, 

(1.12)  There  exists  a  function  <f>  in  Vq  such  that  the  set 
{4>{x  -  k))kez''  is  an  orthonormal  basis  for  Vo 

where  i?  is  a  n  x  n  dilation  matrix. 

All  the  singular  values  Aj, . .  .,A„  of  D  must  satisfy 

(1.13)  |A„.|>1, 

to  ensure  that  the  approximation  gets  finer  in  every  direction  as  j  goes  to  —  oo.  Furthermore, 
we  require  D  to  have  integer  entries.  This  condition  means  that  the  action  of  D  on  the 
translation  grid  Z"  leads  to  a  sublattice  F  C  Z". 

The  number  of  basic  wavelets  required  to  characterize  the  orthogonal  complement  of  Vq 
in  V_]  is  in  that  case  trivially  given  by  the  following  heuristic  argument.  This  complement 
should  be  generated  by  the  action  of  Z"  on  the  basic  wavelets,  in  the  same  way  that  Vq 
is  generated  by  the  action  of  Z"  on  <t>,  whereas  V_j  is  generated  by  the  action  of  D~^Z^. 
Consequently,  each  of  the  generating  functions  can  be  associated  with  an  elementary  coset  of 
Z?“’Z"/Z''  ~  Z"/i?Z"  except  one  which  corresponds  to  the  seeding  function  (see  figure  1). 
Therefore,  d  =  IdetHj  -  1  different  wavelets  are  needed.  Kote  that  it  is  not  strictly 
necessary  that  the  entries  of  D  be  integer  to  build  wavelet  bases  using  D  as  the  elementary 
dilation.  However,  the  condition  seems  to  be  necessary  for  the  existence  of  a  multiresolution 
analysis  based  on  a  single,  real  valued,  compactly  supported  scaling  function. 

In  this  work  we  shall  indeed  focus  on  real  valued,  compactly  supported  scaling  functions 
and  wavelets.  They  have  the  advantage  that  the  sequence  {cn}„g-  introduced  in  the  two 
scale  difference  equation  (1.5)  is  real  and  finite.  These  coefficients  play  an  important  part 
in  the  numerical  applications  because  they  are  used  directly  in  the  Fast  Wavelet  Transform 
algorithm  as  decomposition  and  reconstruction  filters.  They  constitute  in  that  case  an  FIR 
(finite  impulse  response)  filter  which  can  be  implemented  very  easily.  Furthermore,  this 
finite  set  of  coefficients  contains  all  the  information  about  the  multiresolution  analysis  since 
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the  functions  tp  and  can  be  constructed  as  solutions  of  (1.5)  and  (1.6).  Our  starting  point 

to  build  wavelet  bases  wiU  thus  be  a  finite  set  of  coefhcieiits  and  the  associate  two-scale 

* 

difference  equation,  rather  than  the  approximation  spaces  Vj  themselves. 


Figure  1 

and  in  the  case  where  D  = 

The  scaling  function  and  the  four  basic  wavelets  are  indexed  by  an  element  of  I?  j DZ? 

The  main  difficulty  in  this  approach  is  the  design  of  the  FIR  filter  {c„}n=o...A'  in  such  a 
way  that  p  and  V'  are  smooth  and  have  orthonormal  translates. 

In  the  one  dimensional  case,  it  is  shown  in  (Daul)  that  orthonormal  wavelets  can  be 
constructed  by  choosing  a  filter  which  corresponds  to  a  particular  case  of  exact  reconstruc¬ 
tion  subband  coding  schemes,  and  which  can  be  made  arbitrarily  regular  by  increasing  the 
number  of  taps  in  a  proper  way.  Several  contributions  have  followed,  giving  supplementary 
information  on  the  type  of  filter  which  has  to  be  used  (see  [Me2],  [DL],  [Col],  [Dau2],  [Co2], 
[Dau3]). 

In  the  present  bidimensional  case,  the  design  of  filters  associated  to  “nice  wavelet  bases” 
turns  out  to  be  more  difficult  because  some  of  the  one-dimensional  techniques  do  not  gen¬ 
eralize  trivially  (or  do  not  generalize  at  all!)  to  higher  dimensions  and  new  methods  have 
to  be  introduced.  This  article  concentrates  on  the  situation  where  D  is  a  2  x  2  matrix  with 
|det  jD|  =  2. 

We  deliberately  restrain  ourself  to  this  set  of  matrices  for  two  reasons: 


4 


52  Atir/AFOSR  Wavelets  Wcrkshop 


•  These  dilations  have  already  been  considered  by  electrical  engineers  and  seem  to  have 
interesting  applications  in  signal  analysis  and  image  processing.  For  example,  since 
only  one  basic  wavelet  is  required,  one  may  hope  for  a  more  isotropic  analysis  than  with 
the  separable  construction.  Subband  coding  schemes  with  decimation  on  the  quincunx 
sublattice  have  been  studied  in  the  works  of  J.  C.  Feauveau  [Fea]  and  M.  Vetterli  and 
J.  Kovacevic  [KV].  Our  work  is  complementary  to  their  signal  processing  approach 
since  we  investigate  here  the  mathematical  properties,  such  as  the  Holder  regularity 
of  the  wavelet  bases  associated  to  these  schemes.  This  regularity  is  important  when 
one  asks  that  the  reconstruction  of  the  signal  from  the  coarse  scales  has  a  smooth 
aspect  (see  section  II. 2). 

•  These  dilations  are  simple  and  our  study  will  be  reduced  to  the  case  of  two  basic 
matrices.  However,  the  difficulties  which  appear  in  the  evaluation  of  the  regularity  of 
the  corresponding  wavelets  are  common  to  all  the  non-separable  constructions,  and 
the  techniques  that  we  develop  to  solve  this  problem  can  be  used  for  other  types  of 
dilations.  We  believe  that  the  set  of  integer  matrices  with  |detZ)|  =  2  constitutes  an 
interesting  “laboratory  case  ’  in  the  general  framework  of  multidimensional  wavelets. 

In  the  next  section  of  this  paper,  we  shall  give  an  overview  of  different  techniques  which 
can  be  used  in  the  construction  of  one  dimensional  compactly  supported  wavelets.  Some 
new  tools  will  be  introduced  specifically  to  be  generalized  and  used  in  the  multidimensional 
situation. 

The  third  section  examines  the  possible  subband  coding  schemes  with  decimation  on 
the  quincunx  sublattice  and  their  general  relations  with  non-separable  wavelet  bases. 

In  the  fourth  section,  orthonormal  bases  of  wavelets  are  constructed  from  such  coding 
schemes.  We  show  that  for  the  same  filters,  different  bases  with  widely  differing  regularity 
can  be  obtained,  depending  on  the  choice  of  the  dilation  matrix.  Finally,  we  use  a  biorthog- 
onal  approach,  in  section  five,  to  construct  more  symmetrical  wavelet  bases  corresponding 
to  linear  phase  filters  and  allowing  a  more  isotropic  analysis.  We  shew  that  arbitrarily  high 
regularity  can  be  attained  and  we  give  some  asymptotical  results. 


II  The  Construction  of  Compactly  Supported  Wavelets  in 
One  Dimension:  A  Complete  Toolbox 


The  purpose  of  this  section  is  to  review,  in  the  one  dimensional  case,  many  different 
techniques  that  can  be  used  to  build  regular  wavelets  from  subband  coding  schemes,  the¬ 
oretically  and  numerically.  Some  of  these  techniques,  like  the  Littlewood-Paley  estimation 
of  smoothness,  are  not  frequently  used  in  the  one  dimensional  case,  but  they  turn  out  to  be 
very  useful  for  the  non-separable  bidimensiona]  wavelets.  For  more  details,  the  reader  can 
also  consult  [Daul],  [Mel],  [Mai],  [Vel],  [Dau2],  [Me2],  [Co2]  . . . 
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II.  1  Wavelet  bases  and  subband  coding  schemes 

Il.l.a  The  orthonormal  case 

Let  he  a  mulliresolution  analysis  of  £^(K).  We  can  use  the  discrete  Fourier  trans¬ 
form  of  the  finite  sequence  i.e.  the  transfer  function 

hi} 

(2.1)  mo{^*;)=^  c„c-*’^=  £  Cne-’""  , 

n6r  n=Aii 

to  rewrite  the  two  scale  difference  equation  (1.5)  that  characterizes  'p{x).  We  suppose  that 
the  c„  are  real.  Taking  the  Fourier  transform  of  (1.5)  and  (1.6)  we  obtain 

(2.2)  <^(2u)  =  moiu})ip{u) 

(2.3)  t/>(2ij)  =  c"*'"  molu; -I- tt)  (^(tj)  =  Tn)(u>)(^(u>)  . 

Two  fundamental  properties  of  mo(u))  can  be  derived  from  the  multiresolution  analysis 
properties: 

•  Since  {<^(i  -  k)}ic€Z  *s  an  orthonormal  basis  of  Vq,  the  Fourier  transform  (p(u>)  satisfies 
a  Poisson  identity 

(2.4)  |<,5(w  +  2nr)l^  =  1  . 

nez 

Combined  with  (2.2)  this  leads  to 

(2.5)  |mo(u))p  +  |mo(u; -f  7r)p  =  1 
which  may  also  be  written  as 

(2.6)  2  CnC„+2fc  =  Sk.o  (=  1  if  /:  =  0,  0  otherwise)  . 

nez 

•  The  denseness  of  in  X^(IR)  is  equivalent  to  1^(0)  =  f  <p{x)dx  =  1  (see  [Mel], 

[Mai]  or  [Col]). 

Consequently,  we  have 

(2.7)  Tno(O)  =  1  and  mo(")  =  0  , 
which  may  also  be  written  as 

(2.8)  ^  c„  =  1  and  ^  (-l)"c„  =  0  . 

n  =  A'i  n=A/) 

The  subband  coding  scheme  associated  to  our  mulliresolution  analysis  appears  clearly  in 
the  Fast  Wavelet  Transform  Algorithm  of  S.  MaUat  [Ma2].  Let  us  recall  how  it  works.  The 
initial  data  are  considered  as  the  approximation  of  a  continuous  function  at  the  scale  j  =  0, 

(2.9)  5°  =  {f\^{x-k)),  k£Z. 
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This  allows  the  computation  of  the  approximations  and  the  details  at  coarser  scaJt  i.e. 

(2.10)  Si  =  (/iv-i)  and  =  2-->^M/IV^i),  ;>0.- 

(The  coefficients  are  normalized  in  such  way  that  if  /  =  1  locally,  then  5^  =  1  in  that  area). 
The  sequence  [SjAi^z  (resp.  {DDk^z)  ‘s  then  derived  from  {Sl~^)k£L  by  a  convolution 
with  the  filter  mo(-^)  (resp.  mjfw))  followed  by  a  decimation  of  one  sample  out  of  two  to 
keep  the  same  total  amount  of  information,  i.e. 

Si  =  YL  Di  =  5^'  . 

n  n 

The  algorithm  then  iterates  on  {Si)kel-  Conversely,  the  sequence  can  be  recov¬ 

ered  by  applying  the  same  fillers  mo(u))  and  mifo;)  on  {Si}k€Z  and  {-OiJkez  after  inserting 
a  zero  between  every  pair  of  consecutive  samples,  and  summing  the  two  components  (mul¬ 
tiplied  by  two  for  normalization  purposes),  i.e. 

5r'  =  Si  +  (-1)"-’  Di  . 

Jc 

All  these  operations,  decomposition  -  decimation  -  interpolation  -  reconstruction,  con¬ 
stitute  a  complete  subband  coding  scheme  as  shown  on  figure  2.  The  property  of  exact 
reconstruction  can  now  be  derived  in  two  ways.  It  is  a  natural  consequence  of  the  multires¬ 
olution  approach,  since  Vj  =  Vj+j  e  but  it  can  also  be  viewed  as  a  consequence 

of  formula  (2.5)  for  the  filler  mo-  This  type  of  filter  pair  (mo,  mi)  is  known  as  a  pair  of 
“conjugate  quadrature  fillers”  (CQF);  they  were  first  discovered  by  Smith  and  BarnweU  in 
1983  [SBl],  The  design  of  FIR  pairs,  with  real  coefficients  and  perfect  reconstruction,  has 
been  generalized  in  [Daul].  It  also  appears  in  [ASH],  [SB2],  [Vel]. 


Figure  2 

Subband  coding  scheme  corresponding  to  the  FWT  algorithm. 

The  sign  2|  stands  for  “decimation  of  one  sample  out  of  two”  and  21  for 
the  insertion  of  zeros  at  the  intermediate  values. 
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Since  mo(-^)  is  regular  (it  is  a  trigonometric  polynomial)  and  since  mo(0)  =  1,  we  can 
iterate  (2.2)  to  obtain 

+00 

(2.11)  V>(w)  =  n  "io(2‘*u>)  . 

ii=i 

Given  a  conjugate  quadrature  filler  mo(u)  (i.e.  a  trigonometric  polynomial  satisfying  (2.5) 
and  (2.7)),  it  is  thus  possible  to  define  the  scaling  function,  either  as  a  solution  of  the  two 
scale  difference  equation  (1.5),  or  explicitly  with  the  above  infinite  product.  However,  this 
does  not  always  lead  to  a  multiresolution  analysis:  the  function  1,5(1)  =  5X'(o,3]  generated 
by  the  CQF  tuq^u)  =  ,  for  example,  does  not  satisfy  the  orthonormality  of  the 

translates.  OrthonormaUty  of  the  -  k)  turns  out  to  be  equivalent  to  the  conver¬ 
gence  of  the  truncated  products  <Fn{^)  =  nU=i  ^o(2“*u>)x[_2n,,2n,](w)  to  ip(lj)  (because 
{(,;„(j  -  k))k^z  is  an  orthonormal  set  as  soon  as  (2.5)  is  satisfied). 

More  precisely,  the  following  result  characterizes  the  subclass  of  CQF  filter  leading  to  a 
mulliresolulioii  analysis  and  ortiionormal  basis  of  wavelets. 

Theorem  2.1  Lei  mo(u))  be  a  Conjugate  Quadrature  Filter.  Then,  the  infinite  product 
(2.11)  leads  to  a  niultiresolution  analysis  if  and  only  if  there  exist  a  compact  set  K  C  E 
such  that, 

i)  K  contains  a  neighbourhood  of  the  origin, 

ii)  |A'|  =  27r  and  for  all  u  in  [-r,  rr),  there  exist  n  €  S  such  that  u  +  2n7r  G  A', 

iii)  for  all  n  >  0,  mo(2""w)  does  not  vanish  on  A*. 

The  set  K  is  said  to  be  “congruent  to  [-r,  rr]  modulo  27r”  (figure  3).  The  proof  of  this 
result  can  be  found  in  [Col].  It  exploits  the  continuity  of  mo,  the  compactness  of  K  and 
T7io(0)  =  1  to  show  that  (iii)  is  equivalent  to  v>(w)  >  c  >  0  on  A'.  This  is  then  sufficient  to 
derive  the  convergence  of  the  ipn  by  Lebesgue’s  theorem.  We  shall  use  a  multidimensional 
generalization  of  Theorem  2.1  in  the  fourth  section. 


- 1 -  ! 

ii?  -TT  -  tt  q 

3  ^ 

Figure  3 

Example  of  compact  set  congruent  to 


TT  ^ 

3  ^ 

[-X,  -]  modulo  2". 


io 


Il.l.b  The  biorthogonal  case 

The  conjugate  quadrature  filters  are  a  very  particular  case  of  subband  coding  scheme  with 
perfect  reconstruction,  because  identical  fillers  (up  to  a  complex  conjugation)  are  used  for 
both  the  decomposition  and  the  reconstruction  stages.  If  we  don’t  impose  this  restriction, 
then  the  scheme  uses  four  different  fillers:  iho(^}  and  mi(w,;)  for  the  decomposition,  mo(u.') 
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and  mi(i*»)  for  the  reconstruction.  Perfect  reconstruction  for  any  discrete  signal  is  then 
ensured  if, 

a 

{mo(u’)  mo(u;)  +  mi(u>)  mi(u))  =  1 

-  - 

mo(u' +  tr)  mo(a.')  +  n}](u>  +  r-)  nii{u;)  =  0 

mo(u;)  and  thi(w)  may  thus  be  regarded  as  the  solutions  of  a  linear  system.  However,  to 
avoid  the  infinite  impulse  response  solutions,  we  shall  force  the  determinant  of  this  system 
to  be  Q  ^  0,  k  £  Z.  For  sake  of  convenience  we  take  o  =  -1  and  A'  =  1  (a  change  of 

these  values  would  only  mean  a  shift  and  a  scalar  multipUcation  on  the  impulse  response  of 
our  filters).  This  leads  to 

(2.13)  moiu)  mo(u)  +  mo(u>  +  ir)  mo(u)  +  r)  =  1, 

and 

(2.14)  mi(ui)  =  c"'**  mo(u»  +  rr),  mj(tj)  =  mo(u;  +  tt)  . 

The  formulas  (2.13)  and  (2.14)  are  thus  the  most  general  setting  for  finite  impulse  response 
subband  coders  with  exact  reconstruction  (in  the  two  channels  case).  The  functions  mo{u}) 
and  mo(u>)  are  called  “dual  filters”.  It  is  clear  that  the  special  case  mo(w)  =  Tho(w) 
corresponds  to  the  conjugate  quadrature  filters  of  Il.l.a.  However,  dual  filters  are  easier  to 
design  than  CQF’s.  For  example,  if  mo  is  fixed,  mo  can  be  found  as  the  solution  of  a  Bezout 
problem  which  is  equivalent  to  a  linear  system.  The  coefficients  of  these  filters  can  be  very 
simple  numerically  (in  particular  they  can  have  finite  binary  expansion  which  is  very  useful 
for  practical  implementation),  furthermore  they  can  be  chosen  symmetrical  (“linear  phase 
filter”),  a  property  which  is  impossible  to  satisfy  in  the  CQF  case. 

We  can  mimic,  in  this  more  general  framework,  the  construction  of  orthonormaJ  wavelets 
from  CQF.  Assuming  that  mo(0)  =  mo(0)  =  1  and  mo(?r)  =  =  0,  we  define 


(2.15) 

(2.16) 
(2.17) 


+00 

=  n  ”^o(2“''u;) 


<:=! 


^{2u) 

<p{u;)  • 


•  rni{u;)ip{u) 
+  00 


]][  mo(2~''u;) 


fc=i 

(2.18)  ^{2u)  =  m](u;)^(u;)  . 

In  [CDF],  the  following  theorem  was  proved. 

Theorem  2.2 


•  U  s^n(— ')  =  n/;=l  ^^o(2  “'')X'i-2''T.?''r)(^)  —  n!]=l  ^o(2  X'[_2n jn,,] (u;) 

converge  in  X*(F.)  respectively  to  fp(ij)  andtp{u>),  then  the  following  duality  relations 
are  satisfied: 

{<pix  -  A)|s5(i  -  k'))  = 

{tliltLi.)  =  6kJ:' 


(2.19) 

(2.20) 
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and  for  all  f  in  L^{Tk)  one  has  the  unique  decomposition 

(2.21)  HZ 

j=-J  k€Z 


fin  the  sense). 

•  If  ip  and  ip  satisfy  |<,i>(u>)|  +  |3(‘*')1  ^  fo^  some  e  >  0,  then  the  families 

are  frames  of  L^fTt.). 

•  When  these  two  properties  hold,  then  ^Dj.k^z  are  biorthogonal  (or  dual)  Ries: 
bases  of  i^(K). 

Many  examples  of  these  systems  can  l-e  found  in  (CDF)  and  a  sharper  analysis  of  the 
frame  conditions  is  developed  in  (CD).  We  now  recall  a  practical  way  of  constructing  ip  and 
V’  numerically  from  a  given  subband  coding  scheme. 


II.2  The  cascade  algorithm 

In  the  last  section  we  saw  that  the  scaling  function  9(1)  could  be  approximated,  at  least  in 
Z»^(1R),  by  a  sequence  of  band  limited  functions  {<,5n)n>o  defined  by 

n 

(2.22)  (^„(w)  =  JJ  mo(2"-’u;)  X[-2"r.2-‘»)(‘*')  • 

}-t 

These  functions  are  characterized  by  their  sampled  values  at  the  points  2“"/c  {k  6  Z),  i.e., 

(2.23)  sj  =  S5n(2-"fc). 

This  sequence  can  also  be  considered  as  the  impulse  response  of  the  transfer  function 

n  — 1 

(2.24)  5n(u;)  =  2"  n  ”io(2>u;) 

j=i 

5„(u.’)  can  be  obtained  recursively  by  the  formula 

(2.25)  5'n+i(t^)  =  2Tno(u;)  5„(2i^)  . 

In  the  time  domain,  (2.25)  becomes  an  interpolation  scheme;  the  sequence  is  dilated  by 
insertion  of  zeros  (5„(u))— 5„(2u;))  before  being  filtered  (multipbcalion  by  2mo{u})).  We 
have  thus. 

(2.26)  =  2  x;  4  . 

This  iterative  process,  which  computes  the  sequences  from  an  initial  Dirac  sequence 

6o>  is  called  the  ‘‘cascade  algorithm’’.  We  illustrate  it  on  figure  4  (our  sequences  are 
represented  by  piecewise  constant  functions). 
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Note  that  it  identifies  exactly  with  the  reconstruction  stage  in  the  FWT  aigorithm 
described  in  II.  1. a.  The  scaling  function  is  thus  approached  by  the  reconstructed  signal 
from  a  single  approximation  coefficient  at  a  coarse  scale.  Similarly,  the  wavelet  will  be 
obtained  by  starting  the  reconstruction  from  a  detail  coefficient  at  a  coarse  scale  (and  thus 
applying  Tn\{u>)  at  the  first  step  of  the  cascade). 


•>  H 


kl  k, 


el 


»t-ZI 


-t 


<1 


•I  kj 


Figure  4 

The  cascade  algorithm  (from  [Daul]) 

This  explains  why  subband  coding  schemes  associated  with  regular  wavelets  are  particu¬ 
larly  interesting;  the  smoothness  of  the  wavelet  determines  the  appearance  of  the  coarse 
scale  components  of  the  reconstructed  signal.  A  smooth  appearance  is  important  for  many 
applications  such  as  compression  where  a  big  part  of  the  finer  scale  information  is  thrown 
away. 
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In  the  biorthogonal  case,  the  analysis  and  the  synthesis  wavelets  (xp  and  v’)  need  not 
have  the  same  regularity.  As  just  discussed,  smoothness  is  important  for  the  reconstructing 
function;  the  analyzing  function  needs  only  to  be  sufficiently  regular  to  ensure  that  the 
wavelet  bases  are  unconditionad,  so  that  the  FWT  algorithm  is  stable.  Note  that  an  im¬ 
portant  property  on  the  analyzing  wavelet  is  cancellation,  i.e.  vanishing  moments,  ensuring 
small  high  scale  coefficients  for  smooth  regions  in  the  function  or  signal  to  be  analyzed. 

Let  us  finally  mention  that  this  type  of  “refinement  method”  is  weD  known  in  approx¬ 
imation  theory  as  “stationary  subdivision”  (e.g.  (CDM),  [DyL]).  Most  of  these  papers  are 
motivated  by  interpolation  problems,  where  smooth  curves  or  surfaces  need  to  be  con¬ 
structed,  connecting  (or  close  to)  given  sparse  data  points.  Consequently,  they  are  mainly 
concerned  with  what  we  call  the  reconstruction  stage  and  they  do  not  study  the  existence 
of  an  associated  subband  coding  scheme.  This  also  means  that  they  do  not  care  about  an 
easy  way  of  encoding  or  representing  the  extra  “detail  information”  (  — H  j  )  that  can  be 
added  in  going  from  one  refinement  level  to  the  next  one  (\  j— •Vj_i).  On  the  other  hand, 
the  subband  coding  literature  seldom  mentions  the  imjioriance  of  the  smoothness  appearing 
in  the  cascade  of  the  reconstruction  from  the  low  scales.  OrthonormaJ  and  biorthogonal 
wavelet  bases  lead  to  an  elegant  combination  of  these  two  approaches. 

We  now  present  several  different  methods  to  estimate  the  regularity  of  the  wav  ;lets 
associated  to  a  given  subband  coding  scheme.  We  shall  concentrate  on  the  regularity  of  the 
scaling  function  which  determines  the  regularity  of  the  wavelet  itself  because  is  a  finite 
linear  combination  of  translates  of  ^{2x).  Whatever  the  method  used,  if  a  global  regularity 
of  order  r  is  achieved,  then  the  cascade  algorithm  also  converges  uniformly  up  to  this  order 
(see  [Daul],  [DL],  (Co2]). 


II. 3  Regularity:  the  spectral  approach 

II. 3. a  A  Fourier  estimation  of  the  Holder  exponent 

Let  us  denote  by  C°  the  Holder  space  defined  as  foUows.  For  q  =  n  -f  /?  €  [0,  l[,  /  € 
if  and  only  if  it  is  n  times  continuously  differentiable  and  for  all  i  y,  <  C(/). 

Define  also 

(2.27)  T;  =  {/|(1  +  |-;I)“/(u;)  €  X'’}  (a  >  0,  p  >  1)  . 

It  is  well  known  (and  easy  to  check)  that  C  C  C° ,  for  f  >  0.  For  compactly 

supported  functions  /,  we  also  have 


(2.28) 


so  that  the  decay  of  the  Fourier  transform  can  be  used  to  evaluate  the  global  regularity.  To 
estimate  this  decay  in  the  C2Lse  of  the  scabng  function,  it  is  possible  to  use  the  factorization 
of  r7io(-'|;  due  to  its  cancellation  at  w  =  r. .  we  have  indeed 


(2.29) 


mo(-) 


1  -f 


■-} 


N 


p(w)  . 
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The  iafinile  product  (2.11)  is  thus  divided  in  two  parts.  The  first  part,  which  comes  from 
the  factor  decay,  since 


(2.30) 


4-00 

n 


*:  =  ! 


1  +  C 


i2- 


+OC 

n 


lfc=2 


cos(2"*u;) 


2 

— 

—  sin(a’/2) 

The  second  part,  which  involves  the  factoi  p(w),  can  be  controlled  by  a  polynomial  expres¬ 
sion.  Indeed,  since  p(0)  =  1  and  p  is  a  regular  function,  the  infinite  product  generated  by 
the  second  factor  satisfies 


(2.31) 


-f  OO 

n 


<c  =  l 


<  c  n  ■ 

l<*<log()-f|w|)/log2 


Defining,  for  j  >  0, 
(2.32) 


j-i 

sup  J]  p(2*u>) 

‘*'€K  krzb 


ai.d 

(2.33) 


we  obtain 
(2.34) 


log  B 
j  log  2 


+00 


n 


ks\ 


<  <  C(1  +  |wl)^^ 


and 

(2.35)  l^(u>)|<C(l  +  M)^-^’  . 


Consequently,  v>  is  in  and  if  o  <  -  1  for  some  j  >  0.  We  see  here  that  N  must 

be  large  to  allow  high  regularity  since  bj  is  always  posit've.  In  fact,  one  can  prove  that  if  the 
wavelet  is  r  times  continuously  differentiable  then  it  has  at  least  r  +  1  vanishing  moments 

(see  [Mel],  [Daul]),  i.e.  (£)"(t^)(0)  =  (£)"  (tno)(tr)  =  0,  for  n  =  0 . r  +  1  and 

thus  N  >  r  +  1.  These  cancellations  are  also  known  as  the  Fix-Strang  conditions  [SF]; 
they  are  equivalent  to  the  property  that  the  polynomials  of  order  N  -  I  can  be  expressed 
as  linear  combination.*;  of  the  -  ^)}*€Z-  However,  these  conditions  are  necessary  but 
not  sufficient  to  ensure  the  regularity  of  the  scaling  function  since  the  effect  of  N  may  be 
killed  by  a  large  value  of  by  Fortunately,  this  can  be  avoided  by  a  careful  choice  of  the  filter 
mo(w’)  (and.  in  the  biorthogonal  ,ase,  additionally  rho(--‘))- 

In  the  CQF-orthonormal  case,  a  particular  family  of  FIR  filters  indexed  by  N  has  been 
constructed  in  [Daul].  This  construction  uses  the  polynomial 


^2.3G) 


A-l 

Pr^’iv)  =  E 

j=0 


N  -  It  j 
J 


(with  the  shorthand  notation  y  =  sin*(u,72)),  which  is  the  lowest  degree  solution  of  the 
Bezout  problem 

(2.37)  P.x{y){l-yf  t  y'^PA'd-!/)  =  1  • 
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The  corresponding  filters  are  defined  by 


(2.38) 


\  ^ 


ps(u}) 


(2.39) 


1pa/(u)P  =  Psiv)  =  Ps 


]  -  COSu.- 

2  . 


The  Fejer-Riesz  lemma  guarantees  that  there  exists  a  FIR  filter  pA'(‘^)  which  satisfies  (2.39). 
It  is  clear  that  the  CQF  condition  (2.5)  is  equivalent  to  (2.3G)  and  the  conditions  in  Theo¬ 
rem  2.1  are  trivially  satisfied  with  K  -  [-r,  x).  For  large  values  of  A’,  the  regularity  q(N) 
of  the  associated  scaling  function  is  approximately  0.2  N  and  the  exact  asymptotic  ratio 
between  a{N]  and  A'  can  be  determined.  Intuitively  speaking,  this  means  that  the  contri¬ 
bution  of  removes  “eighty  percent  of  the  regularity"  brought  by  the  factor  ( ’ ] 

For  this  estimation,  we  need  to  optimize  the  inequality  (2.35),  i.e.  find  the  best  possible 
exponent  for  the  decay  of 

II. 3. b  Optimal  and  asymptotical  Fourier  estimation:  The  role  of  fixed  points 
A^e  start  by  defining  “the  critical  exponent  of  mo(u>)": 

(2.40)  b  =  inf  6,  =  inf  max  (  .  —  log  TT  p(2*w)  |  . 

r>o  •’  ;>0u.€i  \;log2  j 

Then,  it  was  proved  in  [Co2)  that  under  the  hypothesis  lp(7r)|  >  |p(0)|  =  1  (satisfied  in 
the  present  case  (2.39)),  ip{u))  cannot  have  a  better  decay  at  infinity  than  If  the 

infimum  b  is  attained  for  some  finite  j,  b  =  bj,  then  this  estimate  is  optimad. 

How  can  we  estimate  the  critical  exponent?  A  first  method  consists  in  evaluating  bj  for 
large  values  of  j.  Indeed,  6  is  also  the  limit  of  the  sequence  bj  because  the  boundedness  of 
p  implies  bj  <  b^  +  j  J).  This  may  however  require  heavy  computations. 

In  several  cases,  it  is  possible  to  use  a  more  powerful  method  based  on  the  transformation 
T  :  Lj  2u)  modulo  2-  and  the  fixed  points  of  its  powers  r",  n  >  0.  Indeed,  let  u^o  be  a 
fixed  point  of  r"  for  n  >  0  and  define  its  orbit  for  j  =  0, .  .  .,n  -  1.  Since  p(u>) 

has  period  2x,  we  have 

(2.41)  p(2"'-u.j)  =  p(u.j),  foraU  /:  >  0 
and  consequently 


(2.42) 


^  “’5  n  p(s)  ■ 


Letting  go  to  —oc.  this  leads  to 


77  log  2 


login  p(-'j) 


\A 
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Fixed  points  of  r  lead  llterefore  to  lower  bounds  for  b  and  upper  bounds  for  the  regularity 
index.  In  fact  they  can  do  much  better  and  provide  optimal  estimates  for  certain  types  of  fil¬ 
ters.  Let  us  consider  the  smallest  orbit  of  r  difTerent  from  {0},  namely  the  pair  x}- 

Note  that,  because  our  filters  have  real  coefficients,  lmo(u>)|  and  \p{ij)\  are  even  functions  so 
that  lp(y)l  =  1p(~^)I-  The  following  result  associates  the  value  |p(x)i  critical 

exponent  b. 

Theorem  2.3  Suppose  that  p{u))  satisfies 


(2.44) 


(2.44') 


1p(^)p(2,j)|  < 


..  ,  ,  27r 

■/  M  <  Y 
if  y  <  kl  <  • 


Then 

(2.45) 


b 


1 

log  2 


Proof: 

We  already  know  from  (2.43)  that  b  >  log  |p  (^)j-  We  now  use  the  bounds  on  p  to 
find  an  upper  bound  for  6j,  j  >  0.  We  can  regroup  the  factors  in  (2.32)  by  packets  of  one 
or  two  elements  in  order  to  apply  either  (2.44)  or  (2.44')  on  each  block.  Since  only  the  last 
factor  can  miss  one  of  these  two  inequalities,  we  obtain 


(2.46) 

and  thus, 

(2.47) 

which  leads  to 

(2.48) 

and  to  (2.45). 


k=0 


(I 


i-1 


1 


log  2  L  ; 


i  -  1 


log 


(I) 


6  < 


log 


log 


+ 

2- 


sup  IpI  , 


sup  [logipi]] 


The  equality  (2.45)  means  that  the  worst  decay  of  <p(w')  occurs  for  the  sequence  =  2^, 
n  >  0.  This  is  interesting,  because  (2.44)  and  (2.44')  turn  out  to  be  satisfied  in  many  cases 
and  in  particular  for  the  whole  family  of  CQF  defined  by  (2.38).  (2.39).  This  is  easy  to 
check  directly  for  small  values  of  ,V.  since  the  inequalities  can  be  rewritten  as 


(2.49) 


Psiy)  <  Ps  (^) 


if 


y  < 


3 

4 


(2.49') 


Pa'(p)  Pr<[Ay{  \  -  y))  < 


<  y  <  1  • 
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The  discussion  for  general  N  is  more  difficult  and  we  refer  to  [CC]  for  a  complete  proof  of 
(2.49),  (2.49').  However,  a  similar  result  can  be  obtained  in  a  simple  way.  To  characterize 
the  asymptotical  behavior  of  the  critical  exponent  when  N  goes  to  +oo,  one  doesn’t  need 
the  fuU  force  of  (2.44),  (2.44'),  however.  It  can  also  be  derived  from  a  weaker,  asymptotically 
valid  inequality,  as  proved  by  H.  Volkner  in  (V): 

Theorem  2,4  Lei  b{N)  be  the  critical  exponent  associated  to  (u)  and  q{N)  the  Holder 
exponent  of  the  corresponding  scaling  function.  Then 


This  result  can  be  viewed  as  a  consequence  of  Theorem  2.3,  but  it  can  also  be  proved 
directly  by  using  some  properties  of  P/v(y).  Let  us  write  (2.36)  in  the  following  form: 


(2.51) 


From  (2.36)  we  see  that  Ppi  =  2^'“';  since  Ppj  is  an  increasing  function  between  0 
and  1,  we  have 

(2.52)  Pa-  <  [max  (4y,  2))^"’  =  ly^y)|^’"'  . 

It  is  now  trivial  to  check  that  (2.49)  and  (2.49')  are  satisfied  if  we  replace  PN{y)  by  g{y). 
The  same  argument  used  in  the  proof  of  Theorem  2.3  leads  then  to 


(2-53) 

but  from  (2.43)  we  get 

^  2i^  A-  ')  (i)"” 

This  proves  the  limit  (2.50).  and  consequently  (2.50')  since  the  decay  index  of  the  Fourier 
transform  is  equivalent  to  the  Holder  exponent  when  both  tend  to  +oo.  ■ 


log|s(;)  = 


N  -  1 
2  log  2 


2N  -  2 
N  -  1 


The  use  of  fixed  points  for  optimal  estimations  of  the  spectral  decay  is  thus  very  efficient 
when  one  is  looking  for  arbitrarily  high  regularity  since  a  sharp  asymptotical  result  is 
obtained.  For  small  filters,  this  method  does  not  give  a  good  result  because  the  error  on  the 
exact  regularity  may  have  the  same  order  as  the  value  of  the  Holder  exponent  itself.  For 
such  filters,  other  methods,  which  lake  advantage  of  the  small  number  of  laps  in  the  filter, 
can  be  used  to  derive  more  precise  estimations.  VVe  now  describe  these  methods;  they  are 
typically  based  on  matrix  computations. 


1C 


- 'w - 
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II.4  Regularity:  Matrix  based  sharper  estimates 
II.4.a  The  Littlewood-Paley  approach 

We  first  recall  some  aspects  of  the  Littlewood-Paley  theory.  Let  7(1)  be  a  real-valued, 
symmetrical  function  of  the  Schwartz  class  5(E),  which  satisfies 

f  7(‘-’)  =  0  if  |u;|  <  5  or  |u;|  >  § 

(2-55)  <  ,  ,  .  . 

[  >0  if  ^  <  k|  <  f 

so  that  the  frequency  axis  is  covered  by  the  dyadic  dilations  of  7.  Indeed,  we  have 

+  OC 

(2.56)  0  <  Cl  <  ^  7(2'’u;)  <  Cj  if  w  0  . 

j  =  -oo 

Define  for  any  /  in  5'(E)  the  dyadic  blocks  Aj(/)  by 

(2.57)  A,(/)  =  2^(2^-)./c^  A,(/)  =  7(2-^-)/ 

The  Littlewood-Paley  theory  tells  us  that  several  functional  spaces  can  be  characterized  by 
examining  only  the  norm  of  these  blocks.  This  is  the  case  in  particular  for  the  Sobolev 
spaces  and  the  Holder  spaces  C®,  o  >  0.  To  do  this,  it  is  necessary  to  change  slightly 
the  definition  of  when  a  is  an  integer;  we  shall  say  that  a  bounded  function  /  is  in  C"  if 
and  only  if  belongs  to  the  Zygmund  class  A,  i.e.  there  exists  a  constant  C  such  that, 
for  all  X  and  y,  we  have 

(2.58)  \r-\x  +  y)  +  r-Hx  -y)-  2r-\x)\  <  ciy|  . 

W^ith  this  convention,  the  Holder  space  C°  is  characterized  by  the  following  conditions, 

(2.59)  l|A,(/)||i„  <  C2-“-’’  when  ;  >  0 


(2.59')  /  is  a  bounded  continuous  function. 

Note  that  the  choice  (2.55)  for  7  is  arbitrary  and  that  more  general  functions  could  be 
chosen  to  divide  the  Fourier  domain  into  dyadic  blocks.  To  derive  these  types  of  estimates 
on  the  scaling  function  we  introduce  a  tool  which  will  be  very  useful  in  the  bidimensional 
case. 

Definition  2.1  Let  L^\0,  Itt]  be  the  space  of  It  periodic,  square  integrable  functions  on 
[0,  2r],  and  C[0.  2"]  the  space  of  2"  periodic  continuous  functions.  Then,  for  any  m(u:)  in 
riO,  2"],  we  define  the  transition  operator  Tm  associated  to  m{u>)  by 

(  Tm  :  1^(0,  2-] -  1^(0.  27r] 

(2.60) 

[  f^Trnfiw)  =  m(f)/(f)  +  m['^^T)f['^  +  T)  . 


17 
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Note  that  when  vi(u))  is  a  trigonometric  polynomial,  the  study  of  Tm  can  be  made  in  a 
finite  dimensional  space.  More  precisely.,  if  we  define 


(2.61) 


A’z)  =  <12  {hs . h^^)£  C 


Sj-N)  +1 


then  we  have  clearly 

(2.62) 


(/,  m)  €  \E{Nu  A’2)f  €  E{Nu  N2) 


This  is  due  to  the  contraction  uj  ^  ^  which  appears  in  the  definition  (2. GO)  of  Tm-  If  c„ 
is  the  Fourier  coefficient  of  m(u)),  then  the  matrix  of  Pm  in  the  complex  exponentials 
basis  is  given  by 

(2.63)  '  Te.n  =  (2c2r_n)  . 

The  size  of  this  matrix  P  in  E{N].  N2)  \s  L  x  L  with  L  =  A'2  -  A’j  +  1.  This  operator 
has  been  studied  by  J.  P.  Conze  and  A.  llaugi  and  several  ideas  presented  below  are  due  to 
their  work  [CR],  [Con].  We  shall  use  it  to  derive  Littlewood-Paley  type  of  estimations  for 
the  Holder  continuity  of  the  scaling  function.  For  this,  we  need  the  following  result; 

Lemma  2.5  For  all  n  >  0, 

(2.64)  r  {Tmrf{u)du  =  r  '  /(2-"u;)  TT  m(2-^w)du.  . 

J-r 

Proof: 

We  prove  it  by  induction.  It  is  clear  for  n  ==  1  since 

=  C  h  if)  ^  (I)  +  ^  (I + ')] 

rrp 

=  2  /  [m(u;)/(uj)  +  m(w  +  7-)/(i.j  +  r-)]  dw 

J—rl2 

=  2j^^Tn{u)f{u)du  =  J  ^  (l) 

Assuming  (2.64)  for  n,  we  obtain  at  the  next  step, 

f{u)  cU  =  J'jTmrTmn^)  <L; 

fi'-T  r  «  I 

=  I  ■  n”^(2""‘*')  |m(2-''-'w)/(2-"-^u;)+ 

+  7r)/(2'""’u.- +  r)]  (Lj 

/r  r  n 

n  ”^(2*^^')  [m(u;)/(u>)  +  7n(-;  +  ")/(u.' +  tt)]  d;j 

U=)  J 

;n  +  Jr  rn  +  ]  1 

=  ^nm(2-‘-u.-)J/(2-"-'^-)i-. 


IS 
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This  concludes  the  proof.  • 

We  now  suppose  that  is  a  positive  trigonometric  polynomial  in  Em  =  E[-M,  M) 
and  that  m(0)  =  1  and  m(7r)  =  0.  Then  m  can  be  factorized  as 


(2.65) 


rn(u;)  =  cos^ 


ID 


where  p(uj)  is  a  trigonometric  polynomial  that  does  not  vanish  for  u>  =  rr.  Note  that 
necessarily  N  <  M .  From  this  cancellation  property,  we  can  derive, 

Lemma  2.6  |l,5 . are  eigenvalues  of  Tm.  The  row  vectors pj  =  {n^)n=-M . M, 

for  0  <  j  <  2A’  -  1  generate  a  subspace  which  is  left  invariant  by  Tm  ond  contains  one  eigen¬ 
vector  for  each  of  these  2N  eigenvalues. 

Consegnently.  the  orlhogoual  siibsjxicc  defined  by 


f 

M  'I 

(2.66) 

Y.  =  0,  >  =  0 . 2A’  -  1  i 

[n=-M 

n=— A^  J 

is  right  invariant  by  Tm. 

Proof: 

The  factorization  in  (2.65)  is  equivalent  to  the  cancellation  rules 


(2.67)  (“D"  =  0  for  ;  =  0 . 2A'  -  1  . 

n=  —  M 

In  particular,  for  j  =  0,  we  have 

(2.68)  Hc2„  =  ]^C2n+i  =  ^  (because  m(0)=l). 

This  means  that  the  sums  of  each  column  in  the  matrix  of  T  (2.63)  are  equal  to  1  and  that 

Po  =  (1 . 1)  is  a  left  eigenvector  for  the  eigenvalue  1.  For  0  <  j  <  2N  -  1  we  define 

=  PjP  =  (97^^ . 9j^^);  we  have, 


(2.69) 

Thus,  if  (  is  even 
i2.70) 

and  if  C  is  odd 


9j  = 


li  =  E  ("  +  D  '!■ 


19 
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Using  ihe  binomial  formula  and  the  cancellation  rules  (2.67),  we  see  that  is  a  linear 
combination  of  p*  for  k  =  0,  ...J.  The  coefficient  of  pj  is  given  by  the  last  term  of  the 

binomial  and  is  thus  equal  to  .  Consequently  {pj}j=o . 2A-i  is  a  triangular  basis  for  the 

left  action  of  Tm  and  the  eigenvalues  are  . 2N-i-  ■ 

We  now  come  back  to  the  scaling  function  given  by  the  infinite  product 
(2.71)  <p{^)  =  n  "i(2~*w>)  . 

it=0 

Theorem  2.7  Let  Fp:  be  the  invariant  subspace  ofTm  defined  by  (2.66).  If  X  is  the  largest 
eigenvalue  of  Tm  restricted  to  Fp;  and  i/ |A(  <  1,  then,  defining  a  =  io6(‘^)  (>  0), 

we  have, 

•  is  in  C‘’~‘  for  all  t  >  0 

•  ip  is  in  0“  if  the  restriction  of  Tm  to  the  invariant  subspace  Fx  of  eigenvalue  A  is 
purely  diagonal  (i.e.  =  XI J. 

These  two  estimates  are  optimal  if  <p(u))  does  not  vanish  on  [-tt,  tt]. 


Proof: 

Consider  the  trigonometric  polynomial 

(2,72)  C/v(a;)  =  (1  -  cosw)^  , 


It  clearly  belongs  to  Fpi. 

Consequently,  for  all  n  >  0, 

(2.73)  J^^iTmr  Cp,{u,)<L;  <  {2i:yf^(^jyTmr  du^y^' 

<  C(A  +  f)”  or  CA"  if  Tm\F,  =  >^I- 
We  now  use  Lemma  2.5  combined  with  the  inequality 

(2.74) 

This  leads  us  to 


C^'iu)  >  1  when  <  |w>|  <  "  . 


f  ip{u;)  du  <  C  /  IT  Tn(2  *u;)  du> 

r7''r  r. 

<C  Cn(2-"u;)  n  m{2-^u>)  cL; 

fc=i 

=  C  f  {TmY'  C  !^{u!)d~J  . 

J 
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Consequently  the  Littlewood-Paley  blocks  satisfy  the  inequality 


(2.75) 


(2.75') 


lAj(i^)ll£,i  <  C2  if  Tmlfi  is  purely  diagonal. 


Since  ||Aj(<^)|li:,«  <  ||Aj(v>)||z.i  we  obtain  the  announced  regularity. 

To  prove  that  these  estimates  are  optimal,  we  need  to  reverse  all  the  inequalities  which 
have  been  used.  First,  note  that  since  m(u;)  and  are  positive,  we  have 

||Aj((,?)I1l~  =  l|A_,(<^)lli,>- 

Let  Jx  be  an  eigenfunction  in  Fx-  We  have 

(2.7G)  r  "  Jx{2-'‘^)flmi2-^^<)d^-=  fx(^)cL- =  T  jx[^)(L>  >  C X" . 

J-T  J-r 

Note  that  we  have  supposed  that  /  Jx(^)  du;  ^  0.  U  /a(^)  du;  =  0,  then  the  argument 

•/  — JT  ^  — TT 

has  to  be  modified  slightly;  see  below  (after  (2.78)).  Since  we  have  supposed  that  (p{ux)  does 
not  vanish  on  [-r,  r),  we  have 


(2.77) 


(^(u>)  >  C  J][  m(2~*a;)  for  all  n  >  0  and  |u;|  <  2"5r 
^=1 


Note  that  this  hypothesis  corresponds  to  the  condition  of  Theorem  2.1  with  A'  =  [-x,  z]. 
In  a  more  general  setting,  we  could  replace  the  integrals  on  [-2"r,  2"7r]  by  integrals  on 
2" A’  and  the  same  results  would  hold.  Combining  (2.76)  and  (2.77)  gives 


(2.78) 


r  '  \^{^)\  |/A(2-"u;)|du;  >  CA"  . 


u)  (L)  =  0,  then  a  slightly  more  sophisticated  argument  will  do  the  trick. 

Lemma  2.5  stiU  holds  if  the  measure  dw  is  replaced  by  any  other  measure  of  the  type 
g(^)cL)  where  p  is  a  27r-periodic,  strictly  positive,  continuous  function.  We  can  always 
choose  g  such  that 

J  #  0  ; 

(2.76)  then  holds  if  dux  is  replaced  everywhere  by  g{uj)dux.  Since  g  is  strictly  positive,  this 
modified  version  of  (2.76)  combined  with  (2.77).  still  implies  (2.78).) 

Since  fx  has  a  zero  of  order  2.V  at  the  origin,  the  function  7(1),  defined  by 
■^■(“-■)  =  1/a(-‘)1  Xl-r.rjl-’)  is  convenient  for  the  Littlewood-Paley  analysis  of  Holder  regular¬ 
ity  less  than  2N .  This  is  the  case  for  {p  since  2A’-r  1  vanishing  moments  would  be  necessary 
for  a  higher  Holder  e.xponent  than  2N  (see  (SF],  [DL]  or  [DyL]).  Consequently  (2.78)  telis 
us  that  p  cannot  be  more  regular  than  C° .  To  prove  the  optimality  of  C‘'~‘  when  Tmlri  's 
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not  purely  diagonal,  it  suffices  to  replace  /a  by  a  function  gx  such  that  T„^gx  =  ^9x  +  ixfx 
with  /i  ^  0.  This  leads  to 

(2.78')  /  lpA(2""u;)i  (L>  >  CnA” 

J-2"r 

which  proves  the  optimality  of  C“~‘. 

The  theorem  is  thus  completely  proved.  ■ 


Remarks: 

•  The  estimates  (2.75)  and  (2.75')  can  be  found  by  an  equivalent  technique,  using  the 
transition  operator  Tp  corresponding  to  the  factor  p(u))  in  (2.65).  We  simply  consider 
the  largest  eigenvalue  A,,  and  iterate  Tp  on  /  =  1.  This  leads  to 


f  ip{u})du)  <  C 

1 

J7>-^r<\u>\<7)r 

<  J’ {TpYl  (U 

<  C(Ap  +  ey  2-2^^  (or  CX^,  2'^^^  if  Tp/Ixp  =  Ap/) 

and  thus  tp  €  C®"*  with  a  =  2N  -  .  This  estimate  is  in  fact  the  same  as  (2.75). 

Indeed,  if  m  >s  an  eigenvalue  of  Tm  in  then  its  associated  eigenfunction  can  be 
written  as 

(2.79)  U  =  (sin^  j  g^(u})  . 

Replacing  m(u;)  by  its  factorized  form  in 

(2.80)  m/^(‘^)  =  fa  (l)  rn 

we  obtain,  after  dividing  by  [sin^  (j)  cos^  (f)]^i 

(2.81)  p2^^gjuj)  =  (l)p(f) +  +  • 

W’e  see  here  that  the  eigenvalues  of  Tp  are  exactly  given  by  pp  =  2^^  p.  This  proves 
the  equivalence  between  the  two  techniques. 

•  In  general  miu)  is  not  a  positive  function.  One  can  then  define  M(a;)  =  |m((.j)p  and 
use  the  operator  T^i  associated  to  A/(w).  The  result  is  an  estimate  of  the  L‘  norms 
of  Aj(i,?).  Using  the  Cauchy-Schwarz  inequality,  we  derive  the  following  corollary. 

Corollary  2.8  Suppose  that  M{uj)  =  |Tn(u>)l^  has  a  zero  of  order  2N  at  u:  —  tt.  Define  A, 
the  largest  eigenvalue  oJTm  on  F/i,-  and  o  =  -  log(A;  Then,  p  €  C 

where  is  the  Sobolev  space  of  index  s.  The  value  a  is  attained  ifTM\F^  =  XI. 


00 
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Note  that  the  Holder  exponent  has  no  chance  of  being  optimal  because  we  have  used 
the  Cauchy-Schwarz  inequality  and  is  not  a  positive  function.  The  Sobolev  exponent 
however  is  optimal.  The  regularity  of  compactly  supported  wavelets  was  estimated  with 
this  method  in  [Daul]. 

The  transition  operator  plays  also  a  crucial  role  in  the  biorthogonal  wavelet  theory:  we 
show  in  Appendix  A  how  it  can  be  used  to  prove  that  the  families  and  {I'Dj.kez 

are  unconditional  bases,  with  weaker  assumption  than  the  boundedness  of 
(1  +  imposed  in  Theorem  2.2. 

The  optimal  estimate  for  the  global  and  local  Holder  regularity  of  any  wavelet  can  be 
estimated  by  another  method  developed  by  I.  Daubechies  and  J.  Lagarias  in  [DL].  We  now 
recall  its  main  points. 


II. 4. b  The  time  domain  approach 

Let  Tn(u.')  =  ^n=o  be  a  trigonometric  polynomial  such  that  m(0)  =  1  and  m(7r)  =  0. 

We  do  not  require  that  t7?(ui)  be  positive.  Let  ip{x)  be  the  scaling  function  defined  by  the 
infinite  product  (2.71).  It  is  at  least  a  compactly  supported  distribution  in  |0,A']. 

In  the  time  domain  approach,  we  represent  i^ix)  by  its  “vector”  form  tn(i)  :  [0, !]— 

(2.82)  [u’(a:)]n  =  (^(i  +  n-l)  n  =  l,...,A. 

From  the  two  scale  difference  equation  (1.5)  we  get 

f  Tq  ti;(2i)  if  I  <  ^ 

(2.83)  w{x)  =  i 

[  Ti  tn(2i-  1)  if  I  >  ^ 
where  Tq  and  T\  are  N  x  N  matrices  defined  by 

(2.84)  (To),.;  =  C2.-,-i  l<i,j<N 


(2.84') 

Using  the  notations 

c^n(x) 


(7’i).,j  =  C2._j 

rj^^  binary  digit  of  i  €  [0, 1] 


f  2i  if  I  <  i 

T(l)  =  I 

[  2x  -  1  if  X  >  I  (binary  shift)  . 

we  can  rewrite  (2.83)  as  a  “fixed  point”  equation 
(2.85)  u>(x)  =  Tdjj-)  u>(7-(x))  . 


This  leads  to  an  evaluation  of  u;(x)  and  its  derivative  by  an  iterative  process.  The  regularitv 
of  the  result  depends  of  course  on  the  spectral  properties  of  To  and  T"].  Note  that  when 
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m(w)  has  a  zero  of  order  L  (as  for  the  transition  operator  studied  in  the  previous  section), 

then  the  space  Fi  orthogonal  to  the  vector  Pj  =  ( )„_i . a'  for  j  =  0 . L-l  is  invariant 

by  To  and  T\.  This  method  gives  sharp  estimates  on  the  local  regularity  in  x  by  considering 
the  products  Td,(x)  ■  ■  •'^d„{z)  for  all  n  >  0.  The  main  result  on  global  regularity  proved  in 
[DL;  Theorem  3.1]  is  the  following 

Theorem  2.9  Suppose  that  there  exist  p  <  1  such  that,  for  all  binary  sequence  (dj)jgz 
all  m  >  0,  we  have 

(2.86)  llTrf,7d,  <Cp”'. 

Define  o  =  —  {§|^-  Then, 

•  if  Q  is  not  an  integer,  ip  belongs  to 

•  if  o  is  an  integer,  ip°~^  is  almost  Lipschii:: 

for  all  (x,t),  Ip^'^x  +  /)  -  (^“"’(z)!  <  C\i\  |log|t|  |  . 

Remark: 

•  The  “generalized  spectral  norm” 

max 

djsO  or  1 

. "i 

gives  a  sharp  estimate  of  the  global  regularity.  Note  that  it  is  in  general  superior  to 
the  spectral  radius  of  To  and  Tj.  When  N  is  not  too  large  it  is  possible  to  compute 
the  exact  value  of  p{T\,  Tj).  For  example,  in  the  case  of  orthonormal  wavelets,  the 
optimal  Holder  exponent  was  found  in  (DL)  for  =  4,  6  and  8.  The  same  evaluation 
becomes  more  difficult  for  larger  filters. 

•  The  generalization  of  this  approach  in  higher  dimensions  is  not  trivial.  In  particular, 
it  involves  nonstandard  binary  expansions  depending  on  the  dilation  matrix  which  is 
used.  We  describe  these  techniques  in  Appendix  B. 

As  a  conclusion  of  this  review  of  regularity  estimators,  we  could  say  that  these  three 
approach  are  complementary:  the  time  domain  method  gives  sharp  results  but  it  is  only 
practicable  for  small  filters,  the  Liltlewood-Paley  estimates  can  be  derived  for  longer  fillers 
but  they  will  be  optimal  only  if  m(w)  is  a  positive  function  and  finally,  the  Fourier  approach 
is  less  precise  but  appropriate  to  asymptotical  results  on  very  large  filters.  Let  us  also 
mention  that  another  method  recently  developed  by  0.  Rioul  [Ri]  and  based  on 
norms  estimates  of  the  iterated  filters  leads  to  interesting  results;  in  particular,  it  is  stiU 
manageable  for  larger  filters  than  ’he  time  domain  method  of  [DL]. 

We  are  now  ready  to  deal  with  the  bidimensional  wavelets.  We  start  by  examining  the 
different  subband  coding  schemes  that  can  be  used  to  build  these  non-separable  multiscaJe 
bases. 


(2.87)  p{To,Tx)  =  Urn  sup 

m^oo 
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III  Two  Channel  Bidimensional  Subband  Coding  Schemes 

As  mentioned  previously,  we  shall  concentrate  on  the  dilation  matrices  of  determinant  equal 
to  2  or  -2.  In  such  conditions,  the  subband  coding  scheme  that  we  consider  split  the  signal 
in  two  channels  (instead  of  four  in  the  separable  case)  and  only  one  wavelet  is  then  necessary 
to  characterize  the  detail  coefficients  at  each  scale.  We  first  present  a  short  summary  of 
the  equations  satisfied  by  these  filler.  They  are  immediate  generalizations  of  the  results 
presented  in  II.  1. 


III.l  Genera!  conditions  for  exact  reconstruction 

As  in  the  one  dimensional  case,  the  scheme  that  we  are  considering  here  is  based  on  four 
fundamental  operations: 

•  The  action  of  two  analyzing  fillers,  one  low  pass  Md{u))  =  .A/o(u;i ,  .*^2)  a-nd  one  high 

pass  u)2) 

•  Decimation  on  each  channel  by  keeping  only  the  samples  on  the  sublattice  T  =  DZ^ 

•  Insertion  of  zero  values  at  the  intermediate  points  of  Z^/T 

•  Interpolation  by  two  synthesis  filters,  one  lowpass  Mo(w)  =  Mo(wi,tJ2)  and  one 

high  pass  M\{u)  =  0^2),  followed  by  reconstruction  of  the  original  signal  by 

summation. 

We  see  here  that  the  conditions  for  perfect  reconstruction  wiU  not  depend  on  the  dilation 
matrix  D  but  only  on  the  sublattice  T  =  DZ"^  that  is  generated  (different  matrices  may 
lead  to  the  same  F).  More  precisely,  there  exist  only  two  types  of  grid  corresponding  to  a 
decimation  of  a  factor  2  in 

•  The  quincunx  sublattice,  shown  on  figure  5,  is  generated  by  the  integer  combinations 
of  (1,1)  and  (1,-1). 

•  The  column  sublattice,  shown  on  figure  6,  is  generated  by  the  integer  combinations 
of  (0. 1)  and  (2,0).  It  is  of  course  equivalent  to  the  row  sublattice,  by  exchange  of  the 
coordinates. 

The  same  arguments  that  were  used  in  II. Lb  show  that  perfect  reconstruction  is  achieved 
by  FIR  fillers,  if  and  only  if  they  satisfy  (up  to  a  shift)  the  foDowing  equations,  which  are 
similar  to  (2.13)  and  (2.14). 

•  In  the  quincunx  case, 

(3.1)  Mo[u;)  -f  .A/o(—’ "T  (”' ~))  (~i  ” ))  =  1 

and 

(3.2)  (',  -)),  M,(u,-)  =  ('.  -)). 
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In  the  column  case, 

(3.3)  Mo(yj)  +  Aio{-’+ (*■.  0))  0))  =  ^ 

and 

(3.4) 


Mi(ij)  =  0)),  Mi(u>)  =  e  +  (tt,  0))  . 


■9 - ^ ^ - ♦ 


1 


Figure  5  Figure  6 

Quincunx  decimation  Column  decimation 

If  the  analysis  and  synthesis  filters  are  equal,  we  find  two  generalization  of  the  CQF  condition 

(2.5) ,  The  formulas  (3.1)  and  (3.2)  become 

(3.5)  |Mo(u))|^  +  |Mo(w+ (', -))!^  =  M](u;)  =  Mo(u.’ +  (r,  r))  ; 

whereas  (3.3)  and  (3.4)  become 

(3.6)  |Mo(u')|*  +  |Mo(^-  + (r.  0))|2  =  1,  M^iu;}  =  Mo(^  +  (7:,0))  . 

As  in  the  one  dimensional  situation,  we  warn  to  build  from  these  schemes  the  associated 
scaling  function  which  can  be  viewed  as  the  limit  of  the  cascade-reconstruction  algorithm. 


III. 2  Non-separable  scaling  function  and  u'avelets 
If  Cmrt  are  the  Fourier  coefficients  of  Mo(uj).  i.e. 

(3.7)  Mo(u.-}  =  =  E  c„,n 

m.n 
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Then,  the  associated  scaling  function  d>(i)  =  zj)  satisfies  a  two  scale  difference 

equat'on 

(i.S)  <t>{z)  =  2^  c™„  -  (m,n)) 

m.n 

and  its  Fourier  transform  can  be  expressed  as  an  infinite  product 

-+-00 

(3.9)  ^iu>)  =  n 

k-\ 

which  is  convergent  if  and  only  if  A/o(0)  =  1. 

This  scaling  function  has  compact  support  if  and  only  if  Mo(u>)  is  a-n  FIR  filter.  We 
see  from  (3.9)  that  ep  will  be  highly  dependent  on  the  choice  of  D.  For  the  same  sublattice 
and  the  same  filler,  tht  results  can  be  completely  different  for  different  D.  The  column 

(2  0\ 

sublattice  for  example  is  generated  by  both  matrices  Di  =  (  ^  ^  j  and  D-^  = 

but  ♦he  first  one  cannot  lead  to  an  scaling  function.  Indeed,  we  would  have 

<^.,(0.  2nr)  =  J]  A/o(-Dr*(0.  2n7r))  =  1  , 
it=i 

for  all  n  >  0.  But  since  (p\  is  compactly  supported  and  belongs  to  X^(F.),  it  is  also  in  X'(E) 
and  Its  Fourier  transform  should  tend  to  zero  •  infinity.  We  can  also  remark  that  only  the 
eigen vaJues  of  D‘2  have  their  modulus  strictly  superior  to  1. 

The  choice  of  the  dilation  matrix  is  thus  very  important.  In  fact,  although  the  equations 
(3.1)-(3.2)  are  different  from  (3.3)-(3.4),  the  choice  of  the  sublatlice  is  less  important: 
Indeed,  for  any  dilation  matrix  Dj  such  that  D\Z^  is  the  column  sublattice,  we  can  define 


(3.10) 


Di  =  P  DiP-^  with  P  = 


Clear’y,  the  image  of  by  D-i  is  now  the  quincunx  sublattice.  Then,  for  any  filter 
satisfying  the  column-CQF  condition  (3.6),  the  corresponding  scaling  function  d>i  can  be 
written  in  the  following  way. 


0)(w')  — 


+OC. 

n 


*r=l 


+  CC 

n  Ml[P-^  Pu) 


k=\ 


<Pj(Pu) 


where  02  is  also  a  scaling  funclioii  defined  by 


<E>2(i,,')  —  .A/q(Z?2  1^1 

<  t=l 
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Since  P~’  =  we  have 


|Afo(u>j  .  U>1  +  U>2  )1^  +  \Mq  (W]  +  TT  ,  u^]  +  wJj  +  2x  )|^ 
1  . 


And  thus  Mq  satisfies  the  quincunx-CQF  condition  (3.5).  A  similar  result  holds  of  course  if 
we  start  from  two  dual  fillers  Mq  and  Ml  which  satisfy  (3.3).  This  shows  that  the  scaling 
functions  associated  to  D\  and  D2  are  linked  by  the  simple  relation  <^2(2^)  =  (t>\{Px). 

Consequently  we  can  restrain  our  study  to  the  quincunx  case.  More  generally,  if  D\  and 
£>2  satisfy 

(3.12)  D2  =  PD^P-^ 

where  P  is  a  matrix  having  integer  entries  and  determinant  equal  to  1,  then  we  also  have 
the  same  type  of  equivalence  between  the  scabng  functions.  For  this  reason,  we  shall  only 
consider  the  two  simplest  dilation  matrices  of  determinant  2,  which  cannot  be  related  as  in 
(3.12)  since  they  do  not  have  the  same  eigenvalues: 


(3.13) 


1  -] 
1  1 


Rotation  of  —  and  dilation  of  \/2 
4 


and 


(3.13') 


1  1 
1  -1 


Svmmetrv  around  ^  and  dilation  of 

8 


In  both  of  these  cases  the  image  of  is  the  quincunx  sublattice.  The  wavelet  t/j  is  then 
defined  by 

(3.14)  nPiDu)  =  Mj(u>)d»(w)  D  ^  RorS, 

where  is  defined  by  (3.5)  in  the  orthogonal  case,  and  by  (3.2)  in  the  biorthogonaJ 

czse  where  we  also  have  a  dual  wavelet  defined  by 


(3.15)  x!){Du;)  =  M](u;)<i(u>)  D  =  R  or  S  . 

The  goal  is  now  to  design  filters  leading  to  regular  scaling  functions  and  wavelets.  We  end 
this  section  by  presenting  two  important  families  of  filters.  The  regularity  of  the  associated 
6.  xi\  o  and  ti’  will  be  estimated  in  section  IV  and  V  by  different  techniques  which  aU  are 
natural  generalizations  of  the  one  dimensional  tools  that  we  introduced  previously. 


III. 3  Filter  design 

III. 3. a  The  orthonormal  case 

Recall  (see  [Daul])  that  in  IP,  the  CQF  filler  can  be  designed  in  the  following  way,  in  order 
to  obtain  wavelets  with  an  arbitrarily  high  regularity: 

1)  For  a  given  number  A'  of  vanishing  moments,  define  mo  by 

(3.16)  |mo(-)l^  =  cos^  Pp:  sin^ 
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where  P!\’(y)  is  a  polynomial,  solution  of  the  Bezout  problem 


(3.17) 


Pa-(i  -y)  +  (i  PMy)  =  1  • 


The  minimal  degree  choice  is  given  by  P^'(y)  = 


A’  -  1  +  ; 


2)  Find  the  function  mo(w)  by  using  the  Rjesz  lemma  which  guarantees  that  there  exist 
a  trigonometric  polynomial  solving  (3.16). 

Unfortunately,  this  last  result  does  not  generalize  to  higher  dimensions.  We  thus  have 
to  find  other  means  to  build  trigonometric  polynomials  which  satisfy  (3.5).  One  possi¬ 
ble  method  is  the  “polyphase  component”  construction  used  by  Vaidyanathan  [Va]  and 
M.  Vetterli  [Ve].  [VK].  It  is  based  on  the  remark  that  Ado{^)  satisfies  (3.5)  if  and  only  if  the 
polyphase  matrix 


(3.18)  = 


Mq{u))  -f  Mo(i^  "T  (r,  tt))  •¥  Mi{u>  (tt.  tt)) 

Mq(^)  -  +  (r,  r))  Mt{u)  -  Mi(tJ  +  (tt,  r)) 


is  unitary  for  all  uj.  Since  the  product  of  two  polyphase  matrices  is  also  a  polyphase  matrix 
for  a  third  pair  of  filter,  infinite  families  can  be  constructed  by  multiplying  elementary 
building  blocks  of  the  type  (3.18)  as  soon  as  we  know  some  simple  filters  which  satisfy 
(3.5).  The  disadvantage  of  this  method  is  that  it  does  not  furnish  the  vanishing  moments 
in  a  natural  way.  Recall  (see  [Mel])  that  the  A'  times  differentiability  of  the  function  i' 
implies 

(3.19)  |t^(u;)l  <  C  (|u;|  -  0) 

and  thus  Mo(u)  has  necessarily  a  zero  or  order  A'  +  1  at  the  frequency  u>  =  (tt,  tt).  This 
can  also  be  viewed  as  the  Slrang-Fix  condition  (see  [SF])  for  the  regularity  of  the  scaling 
function  w. 

The  simplest  way  to  build  such  fillers  with  A’  arbitrarily  high  is  to  remark  that  if  Tno(u.') 
is  a  ID  solution  of  the  CQF  equation  (2.5),  then  the  2D  filter  defined  by 


(3.20) 


Mo{u}  —  Mo(i^j,LL>2)  —  ^o(^}) 


satisfies  the  equation  (3.5).  It  is  apparently  a  good  candidate  for  building  regular  wavelets 
since  it  has  the  same  order  of  cancellation  in  {z,  as  mo(u})  in  tt.  This  allows  us  to  build 
an  infinite  family  of  filters  with  an  arbitrarily  high  number  of  vanishing  moments  by  posing 

(3.21) 

where  {ttiq  (u^)}a'  >0  is  the  family  of  filters  designed  in  [Daul],  defined  by  (2.35),  (2.3  )  and 
(2.38).  Mote  that  the  filter  (3.21)  has  a  unidimensional  structure  but  since  the  dilation  D 
contains  either  a  rotation  or  a  symmetry,  the  final  analysis  (using  iterates  of  the  filter)  is 
performed  in  all  the  directions  of  the  plane.  In  section  IV.  we  shall  take  a  closer  look  at  the 
associated  wavelets  and  their  regularity.  U  D  =  R,  then  one  can  also  derive  another  family 
of  “almost”  one-dimensional  filters  Mq  from  unidimensional  ttiq  (they  get  again  fanned  out 
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to  Other  directions  by  applying  Explicitly, 


1 

2 


+  mo 


This  construction  corresponds  to  a  filter  with  taps  on  two  diagonals,  =  0  if 

ni  ^  Ti2  and  Tij  ^  -nj  +  1.  It  is  easy  to  check  that  this  Mo  satisfies  (3.5)  if  mo  satis¬ 
fies  |mo(u))P  +  jmo(u)+  ir)!^  =  1.  If  mo(0)  =  1,  mofrr)  =  0,  then  Mo(?r,7r)  =  0  follows,  so 
that  Ml,  as  defined  in  (3.5),  satisfies  Mi (0,0)  =  0.  as  it  should.  One  easily  checks,  how¬ 
ever,  that  Mo(7r,  r)  and  d^Mo(7r,  tt)  cannot  both  be  zero  for  these  examples,  so  that  the 
corresponding  bases  cannot  possibly  by  C’ .  Only  the  small  examples  are  therefore  of  any 
interest;  it  seems  possible  (numerical  experiment)  to  construct  a  continuous  d>  corresponding 
to  a  4-tap  filter  in  this  way. 


Ill.S.b  The  biorthogonal  case 

The  filter  design  is  clearly  easier  in  the  biorthogonal  situation.  One  can  start  from  a  given 
filter  Mo(w)  and  find  the  dual  Mq{u)  by  solving  linear  equations. 

In  particular  we  can  look  for  filters  which  have  more  isotropy  than  those  of  the  family 

(3.21) .  Here,  again,  the  one  dimensional  theory  can  help  us  to  build  families  of  filters  in  a 
simple  way.  Several  examples  of  real  and  symmetrical  dual  filters  have  been  designed  by 
the  authors  and  J.  C.  Feauveau  in  |CDF]. 

In  these  one  dimensional  construction  the  symmetry  allows  us  to  use  the  variable 
y  =  sin^  (y)  and  to  write  the  transfer  functions  as 

(3.22)  mo(u>)  =  p{y)  and  mo(u;)  =  p{y) 
where  p  and  p  are  two  polynomial  satisfying 

(3.23)  p(y)p(2/)  + p(i  -  y)p(i  -  y)  =  1  • 

In  two  dimensions,  consider  the  variables  j/i  =  sin^(i^)  and  y^  =  sin'^  {^).  If  the  filters 
are  symmetrical  with  respect  to  the  vertical  and  the  horizontal  a^ces,  the  duality  condition 
in  (3.3)  can  be  rewritten  as 

(3.24)  F(i/i,  y2)F(yi,  yz)  +  F(1  -  yi,  1  -  y2).P(l  -  yi,  1  -  yz)  =  1, 


where  ■P(yi,y2)  =  ,  *--2),  .P(yi,  yz)  =  Afo(wi,c^2). 

We  see  that  a  possible  choice  for  P  and  P  is  given  by 

(3.25)  PiVuVi)  =  Plo-yi  +  (1  -  a)y2) 

(3.25')  PiVi-  yi)  =  p(ayi -f  (1  -  a)y2) 
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where  a  is  in  (0. 1].  For  arj  optima)  isotropy  it  is  natural  to  dioose  o  =  in  this  case 
the  diagonals  are  also  symmetry  axes.  This  choice  is  known  in  signal  processing  as  the 
McClellan  transform  of  the  ID  filters  p  and  p.  Using  the  variable  r  =  +  yj)  we  can 

thus  write 

(3.26)  Mo(u.’)  =  p(r)  and  Moi^)  =  p(r) 

where  p  and  p  are  polynomials  satisfying  (3.24 ).  These  polynomials  must  also  satisfy 

(3.27)  p(0)  =  p(0)  =  1  and  p(l)  =  p(l)  =  0 

which  are  necessary  for  the  construction  of  wavelet  bases.  Note  that  we  have 

(3.28)  r  =  i  (sin'  +  sin'  ^ 

and  thus  c  can  be  regarded  as  the  transfer  function  of  the  filter  which  computes  the  discrete 
Laplacian  with  the  formula 

(3.29)  —  g  (^^m.ti  ~  — l,n  ~  ^m  +  l,Ti  ^m,n  — 1  )  • 

Since  a  Laplacian  scheme  has  frequently  been  proposed  in  image  processing  to  detect 
the  edges  with  a  maximum  isotropy  (see  (AB),  (M]),  it  seems  tempting  to  use  z  or  one  of 
its  power  as  a  high  pass  analyzing  filter  (and  thus  1  -  s  as  the  corresponding  low  pass 
synthesis  filter).  This  can  be  achieved  in  a  very  simple  way,  by  a  method  already  used  to 

build  biorthogonal  bases  in  I^(E).  Recall  that  Ps{s)  =  Z^^='o'  i  ^ 
lowest  degree  solution  of  the  Bezout  problem 

(3.30)  z^Pf^il  -  z)  +  (1  -  zfPNiz)  =  1  . 

If  we  fix  the  reconstruction  low  pass  as  Mq  {u)  =  (1  -  z)^’  (so  that  the  analyzing  high  pass 
is,  up  to  a  shift,  the  power  of  the  Laplacian),  then  a  possible  choice  for  the  dual  filter 
is  given  by 

(3.31)  '  =  (1---)^  Pn+l(^-) 

where  L  is  a  positive  integer  indicating  the  cancellation  order  of  Mo  a-t  w  =  (-,  r).  L 
has  to  be  chosen  large  enough  so  that  both  functions  ^{x)  and  (p(x)  satisfy  the  necessary 
conditions  to  generate  a  pair  fcez’  unconditional  Riesz  bases  (see  Theorem  2.1 

and  Appendix  A).  We  shall  examine  the  properties  of  these  functions  and  give  an  estimate 
of  the  minimal  value  of  L  in  Section  V. 

We  have  now  at  hand  two  famibes  of  fillers,  orthonormal  and  biorthogonal  with  an 
arbitrarily  high  number  of  vanishing  moments.  We  stiD  have  to  know  if  these  filters  allow 
us  to  build  wavelet  bases  with  an  arbitrarily  high  regularity  like  in  the  one  dimensional  case 
([Daul],  [Co2]).  As  we  shall  see  in  the  two  next  sections,  the  results  of  our  investigations 
are  very  surprising  and  show  that  the  multidimensional  situation  contains  a  lot  of  new 
difficulties  from  this  point  of  view. 
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IV  Orthonormal  Bases  of  Non-separable  Wavelets 

Let  us  consider  the  family  of  CQF  fillers  defined  by 

(4.1)  =  m^(w]) 

with 


(7)r|T-r^' 


cos 


(4.2)  = 
and  the  associated  scaling  functions  for  the  dilations  S  and  R: 

(4.3)  ^S'.s(^)  =  n  M^{S-^u) 


sin-  I  - 


k=L\ 

oc 


(4.4) 


<Pa’.h(u;)  =  n  • 

*=i 


rv.l  Orthonormality  of  the  translates 

A  first  requirement  is  that  the  Z^-translates  of  d>A'.s  a-nd  <i>p,\R  are  orthonormal.  This  is 
a  necessary  and  sufficient  condition  to  generate  multiresolution  analyses  and  orthonormaJ 
bases  of  wavelets. 

Theorem  4.1  For  all  N  >  0,  the  functions  d>A',S  4>h\R  orthonormal  translates 
and  generate  wavelet  bases  of  the  type  -  k),  j  ^  Z,  k  €  Z'^ ,  D  —  S  or  R. 


Proof: 

By  a  trivial  generalization  of  Theorem  2.1,  this  orthonormaJity  is  ensured  if  and  only  if 
|^(u;)|  >  C  >  0  on  a  compact  set  K  congruent  to  [— ;r,  tt]^  modulo  27:2^  which  contains  a 
neighbourhood  of  the  origin. 

It  is  clear  that  vanishes  only  on  the  vertical  lines  wj  =  {2k  +  I)?:,  k  e  Z. 

Consequently  we  see  that  the  simple  choice  K  =  [-tt,  tt]^  is  not  convenient  since  for  both 
dilations,  we  have 

(4.5)  p-'(r.  r)  =  (7r.O) 
and  thus 

(4.6)  =  0  . 

Recall  that  in  the  one  dimensional  case,  the  trivial  choice  K  =  [-r,  tt]  was  convenient  for 
the  family  m^(u!).  Here  we  have  to  use  a  compact  set  K  slightly  different  from  tt]" 
so  that  D~^ K  n  {u)i  =  {2k  +  1)"}  is  empty  for  ah  j  >  0  and  for  ah  k  in  Z.  This  can  be 
done  very  easily  by  removing  smah  neighbourhoods  of  (x,  x)  and  (-x,  -x)  and  translating 
them  by  (-2x,  0)  and  (2x,  0)  as  shown  in  figure  7. 

One  checks  easily  that  ah  the  sets  D~^K  for  j  >  0  are  contained  in  the  strip  |u;i|  <  x-t, 
t  >  0  where  Mq  {u))  does  not  vanish. 

We  now  have  to  check  the  regularity  of  the  scahng  functions  which  have  been  obtained. 
We  shah  see  that  the  results  are  completely  different  depending  on  whether  one  chooses  S 
or  R  as  the  dilation  matrix. 
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Figure  7 

The  convenient  compact  set  K  congruent  to  [-r.  r]^: 

Neighbourhoods  of  (-.  r)  and  -r)  have  been  shifted  so  that  does  not  vanish  on  K 


I\^.2  The  svmmetrv  dilation  case 


In  this  case  the  dilation  matrix  is  5  =  ^  and  its  inverse  is  S~^  =  Since 

).  we  have  to  consider  the  sequence  )j>o  for  a  given  u;  =  (^j, 

Clearly,  it  has  the  following  form: 

-(-,•]  -r  w2 ) .  -(u;i  +  ik»2),  -u-'i, - 2“^(u;i  +  u>2 ), 

1  2.  A  A 

Since  5“^  =  the  odd  and  the  even  parts  are  simple  dyadic  sequences  and  this  leads  to; 
(4.7)  OA’.5(-)  =  -i- iJ2)  ) 


(4.S)  O.K.si^)  =  i,5a-{T2)  -  X2) 

where  i^a’  the  one  dimensional  scaling  function.  The  associated  wavelet  is  defined  bv 
(4.9)  i.-A'.5(w')  =  .■U](-'')C’A’.5(‘*'')  =  -f  u’2) 
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or 

(4.10)  V'S.si^)  =  W(j-2)  -  ^7)  ■ 

We  see  here  that  the  scaling  function  and  wavelet  are  ii.  this  case  separable  in  the  sense  that 
they  can  be  expressed  directly  in  terms  of  the  one  dimensional  functions  and  V'a'-  This 

separability  can  be  explained  by  the  fact  thai  S  is  similar  to  the  matrix  ^  ^  q  j  1  whicii  is 

simply  a  dilation  by  a  factor  2  (in  one)  direction,  followed  by  an  exchange  of  the  axes.  The 
regularity  can  of  course  be  made  arbitrarily  high  since  it  is  directly  given  by  the  Holder 
exponent  of  (p^r. 


Remark: 

Theorem  4.1  is  not  necessary  here  to  prove  the  orthonormality  of  the  translates  since  it  is 
a  trivial  consequence  of  the  separability  formulas  (4.7)  and  (4.8). 

We  now  consider  the  case  of  the  matrix  R  which  is  by  far  less  trivial. 


r^^.3  The  rotation  dilation  case 

We  now  have  ^  J  ^  ~  2  ^  \  l)  '  sequence  is 

then, 

-(a>i+u>2),  -wj,  -(wj  -  wi), --Wi, --(u>i -r  ui2), 

I  2  A  4  6 

”8^^’  16  “ ‘*^2),  — (t^^l),  ^(^1  +  u>2),  — W2,  •  •  • 

Here  the  first  power  of  R~^  proportional  to  the  identity  is  R~*  =  -\I ■  Consequently,  it  is 
not  possible  10  use  the  one  dimensional  scaling  functions  and  wavelets  ic  express  the  d>A- 
and  ‘tbpi  in  a  separable  way.  We  first  consider  the  case  A'  =  1  which  corresponds  to  the  Haar 
filter.  The  result  of  the  cascade  algorithm  with  this  filter  shows  how  different  the  situation 
is  when  R  is  used  instead  of  S. 

IV.3.a  The  twin  dragon 

For  Mq{u)  =  1— "2..  ^  function  satisfies 

(4.11)  Oi,/?(i)  =  0i,n{Rx)  d>i.H(-Rr  -  (1,0)) 
and 

(4.12)  OuB  =  n  Af^(/?-*u;)  . 

By  iteration  of  the  cascade  algorithm,  one  finds  that  o  is  the  characteristic  function  of  a 
well  known  fractal  set  called  the  “tw'in  dragon"  (see  [K])  shown  in  figure  8.  This  set  can  be 
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Figure  B 

The  “twin  dragon”  set  A 

defined  directly  in  the  complex  plane  as 

(4.13)  ^  =  jf; 

and  it  is  clear  that  Oj,/?  =  Xii  solves  (3.41)  since  we  have 

(4.14)  A=  (^^)Au(^)(A+l)~iZ-’Aui?-^(A  +  (0,l)). 

The  self  similarity  of  A  is  thus  expressed  by  the  two  scale  difference  equation  (4.11),  but 
furthermore,  since  the  family  {<f>i.H(i  -  is  orthonormal  (by  Theorem  4.1)  and  since 

|A|  =  ©i./?(0)  =  1.  these  integer  translates  constitute  a  fractal  tiling  of  the  whole  plane 
(similarly  to  the  squares  obtained  in  tensor  product  situation  with  the  same  filter). 
This  beautiful  property  has  been  remarked  independently  by  W.  Madych  and  K.  Grochenig 
[MG]  and  W.  Lawton  and  H.  Resnikoff  (LR).  More  generally,  such  tilings  can  be  derived  by 
considering  a  two  scale  difference  equation  of  the  type 

d 

(4.15)  ©(i)  =  ©(T>i  +  e,) 

.=1 
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where  Z?  is  a  dilation  matrix  and  . ^  are  d  representatives  of  f DZ^ 

(d  =  ldet£>l).  This  scaling  function  and  the  corresponding  wavelet  do^  not  seem  how¬ 
ever  of  great  interest  for  image  processing:  not  only  are  they  discontinuous  but  the  set  of 
discontinuity  js  a  very  chaotic  fractal  curve.  Nevertheless  the  twin  dragon  is  important  in 
estimating  the  regularity  (local  and  global)  of  the  wavelets  with  dilation  matrix  R.  Indeed, 
if  we  want  to  generalize  the  method  of  [DL]  (see  Section  11.4. G).  it  is  necessary  to  consider 
the  expansion  of  any  point  in  C  in  terms  of  the  power  of  (•^)  w-hich  also  means 

that  the  point  is  considered  as  the  hmit  of  a  “dragonic  sequence”  {Ajjjgz  with  Aj  C  Aj_i 
and  jAjI  =  2~-’.  These  ‘‘dragonic  expansion”  techniques  are  described  in  Appendix  B. 

Let  us  now  examine  the  functions  obtained  with  higher  order  filters  which  have  more 
vanishing  moments. 

IV. 3. b  Higher  order  filters 

We  are  interested  in  the  family  of  scaling  function  d'A’.fl,  A  >  1. 

Recall  that  in  the  one  dimensional  case,  the  asymptotic  result  ensuring  arbitrarily  high 
regularity  (Theorem  2.4,  Section  II. 3. 6)  is  based  on  the  value  of  mo  (— ^)  since 
is  a  cyclic  orbit  of  w'  —  2u>  modulo  27r.  In  the  present  case  similar  considerations  for  a  fixed 
orbit  of  u  ^  Rlj  modulo  2rZ^  lead  to  an  opposite  result:  arbitrarily  high  regularity  cannot 
be  obtained  by  increasing  the  number  of  vanishing  moments.  More  precisely,  we  have 

Theorem  4.2  For  all  N  >  0,  the  function  is  not  in  C^(K^). 


Proof: 

This  is  of  course  true  for  A'  =  1  since  we  obtain  the  twin  dragon.  For  A’  >  1,  we  shall  prove 
a  stronger  result:  the  decay  at  infinity  of  dtA'.HCw’)  cajinot  be  majorated  by  Cju^l  ^  (which 
is  a  necessary  condition  for  (ps.R  I'O  he  in  because  it  is  a  compactly  supported  function). 
For  this  we  consider  the  orbit  of  u;  >—  Ru)  modulo  2<rZ^  given  by  the  four  points  v)’ 

(¥’-¥)’  (“¥’  ^)-  denote  uo  =  t) 

checks  easily  that 

(4.16)  =  C’a'  7^  0  for  all  A'  >  0  . 

We  then  have,  for  all  A'  >  0, 
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and  thus,  since  li'j!  >  2^^^. 


■  2 

A'-r 

)l  >  ^A' 

u) 

r  ItJJ 

>  CN\v,r^- 


with  Qfj  =  log  fl  -  sin^  (f)  (sin^l^)) 


A'-i 


Qi  ~  0.6115  <  1.  this  ends  the  proof. 


Clearly  ok  is  decreasing  with  N.  Since 


Figure  9 

.Approximation  of  the  scaling  function  (j>2R 

In  fact,  these  wavelets  do  not  even  seem  continuous  although  we  have  no  mathematical 
proof  of  it.  .A  simple  look  at  the  result  of  the  cascade  algorithm  for  the  4  laps  filter  (which 
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corresponds  lo  a  .55  Holder  continuous  one  dimensional  wavelet)  shows  how  chaotic  the 
functions  <Pr.s  can  be  (figure  9).  The  design  of  FIR  filters  leading  to  regular  wavelet  bases 
with  R  as  the  dilation  matrix  seems  to  be  a  difficult  problem.  Using  a  polyphase  component 
approach  M.  Vetterli  and  J.  Kovacevic  ([KV],  p.  32)  have  constructed  a  filter  for  which  the 
result  of  the  cascade  looks  continuous  but  no  infinite  family  with  arbitrarily  high  regularity 
has  been  designed  so  far. 

The  main  difficulty  which  makes  this  design  unpracticable  is  the  absence  of  the  Riesz 
lemma  in  more  than  one  dimension  and  thus  the  impossibility  to  start  by  designing  the 
square  modulus  of  in  an  appropriate  way.  Apart  from  this  problem,  the  CQF  filters 

(in  particular  the  family  (3.21)  that  we  have  introduced)  cannot  be  symmetrical.  We  must 
keep  in  mind  that  one  of  the  interests  of  the  quincunx  grid  decimation  is  to  have  a  more 
isotropic  analysis;  this  is  only  achieved  if  the  filter  coefficients  are  themselves  symmetrical 
around  the  horizontal,  vertical  and  diagonal  directions. 

These  two  reasons  encourage  us  to  construct  biorthogonal  bases  of  wavelets  from  dual 
filters  for  which  the  Riesz  lemma  is  not  necessary  and  linear  phase  can  be  achieved. 


V  Biorthogonal  Bases  of  Nonseparable  Wavelets 

Let  us  recall  the  family  of  dual  fillers  introduced  in  III.3.b.  It  is  based  on  the  variable 
2  =  5  (sin^  (^)  +  sin^  i^))-  We  have  chosen, 

(5.1)  Mo^’(a;)  =  (1  -  zf 


and 

(5.2)  =  (1  -  zf  FA>+L(r) 

where  L  is  still  to  be  fixed. 

A  first  remark  is  that  the  action  of  the  dilation  matrices  R  and  S  on  the  variable  z  are 
equivalent.  This  is  due  to  the  fact  that  z  is  invariant  if  we  exchange  W]  and  U2  or  if  we 
change  the  sign  of  one  of  these  variable.  We  shall  thus  consider  a  dilation  matrix  D  which 
can  be  equal  to  R  or  5.  To  express  its  action  on  2  we  still  need  the  two  variables 


(5.3) 

We  then  have. 


y-i  =  sin^ 


and  y2  =  sin^  ( 


^(1/1  +  yz) 


D 


z  =  ^(yi  +  yz  -  2yiy2) 

‘  =  ^(4y)(i  -  yi)  + ‘^y2(i  -  yz))  =  ^(yl  +  yz) 

-  =  ^(y!  +y:-2y;y))... 


We  shall  at  first  study  the  scahng  function  <?i  associated  to  the  filter  Mq  (a;)  =  1  -  r.  because 
it  is  the  elementary  building  block  for  the  family  <p,v  (=  (»)^C']). 
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V.l  The  quincunx  Laplacian  scheme 

The  coefficients  of  are  centered  around  the  origin  and  have  the  following  form: 


(5.4)  -  X  4  1  j  . 

Note  that  this  is  the  simplest  symmetrical  filter  (with  respect  to  the  horizontal,  vertical  and 
diagonal  directions)  which  satisfies  the  cancellation  condition  Mo(7r,7r)  =  0.  To  estimate 
the  decay  of  <P\(oj)  we  could  hope  for  a  bidimensional  formula  equivalent  to 


(5.5)  n  ^  , 

*r=l  ^ 

used  in  the  one  dimensional  case.  Note  that  (5.5)  is  based  on  the  iteration  of 
sin u)  =  2  sin  (y)  cos  (^).  Unfortunately,  similar  relations  do  not  exist  in  the  bidimensional 
case  for  the  dilation  matrix  D.  In  particular  the  infinite  product 

+  0O 

(5.6)  Mu)  =  n  Mo{D-^u) 


has  no  simple  expression  and  one  checks  easily  that,  unlike  (5.5),  it  does  not  have  uniform 
decay  at  infinity.  Indeed,  let  us  consider  the  sets  {(x’  ^)}  {(^’  t)  *  (t'  ®)}- 

These  are  two  cyclic  orbits  of  u  ^  Du  modulo  and  modulo  the  exchange  of  coor¬ 
dinates  and  sign  changes  which  do  not  affect  the  variable  Consequently,  if  we  define 
Vj  =  D^  and  pij  =  we  have,  when  j  goes  to  +oc. 


cosMfH  cos^  {^)V 


(5.10) 


©i(Aj)  ~  C 


co.s^  (§)  1 

2 


cos’  (f)  -f  cos’  (^) 


~  2.83 


(j)  ^  1 
2 


cos’  -  ~  2.68  #  Qt  . 


Still  we  would  like  to  find  a  global  exponent  for  the  decay  of  c»i(u.')  at  infinity.  For  this 
we  shaU  introduce  an  '■artificial"  function  which  wiL  play  the  same  role  as  cosu  in  (5.5). 
We  define. 

,  Sin'(^)  +  sin=(^)  , 

^  —  ol-Z/u'M,  -Z/uaM  ’  C(0)—  1. 

2  [sin"  (^)  -i-  sin"  (^f-)] 
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Conirarily  lo  C(^)  is  not  a  trigonometric  polynomial,  but  it  is  a  bounded  regular 

function  which  vanishes  at  the  point  (rr,  x)  with  the  same  order  of  cancellation  as  A/o(u>). 
Moreover,  it  satisfies  by  construction. 


^  ^  r.  ,  X  2  [sin’ (^) +  sin^  (i^)]  ^ 

(5.12)  n  C(D-^lj)  =  - - 5_LLii<C(l 

*=1  '  2 
The  decay  of  this  infinite  product  is  now  uniform  and,  for  this  reason,  will  play  an 

important  role  in  the  construction  of  our  dual  bases.  For  the  moment,  by  comparing  C{u)) 
and  Mq{u)),  we  obtain  the  following  result. 

Proposition  5.1  The  decay  oj  (fniu)  at  infinity  is  controlled  by 

(5.13)  |<^i(u>)|  <  C(l  +  M)-’  . 

Furthermore,  this  exponent  is  globally  optimal,  i.e.  there  exists  a  sequence  such 

that  lim_,_-+oc  =  +oc  and  |<pi(w'_,)|  ~ 


Proof: 

Using  the  variables  j/j  =  sin’  ^  and  yj  =  sin’  ^  we  can  rewrite  C(w). 

+  1^2  -  2yiy;  _  (1  -  yi)y2  +  (1  -  y2)y] 

We  thus  have 


vi  +  y2 


Vt  +  y2 


,ri/ (1  -  yi)y2  + (1  -  y2)i/i  (1  -  yi)  + (1  -  y2) 

C(u>)  -  Mo(u.j  -  - „  ,  , -  -  - 5 - 

yi  +  y:  2 

(1  -  yi)(y2 "  yi)+  (l  -  y2)(y3  -  y2) 


(yi  -  y2)^ 


2(y]  +  y2) 


>  0  . 


2(yi  +  yz) 

Thus  Mq(ui)  <  C(w)  and  by  (5.12)  |<Pi(u;)l  <  C(1  -i-  |u;|)~’.  To  prove  that  this  exponent  is 
optimal  we  consider  a  small  vector  p  ^  0  in  IP;^  and  define 

(5.15)  u,'j  =  D^{~,x)-rp, 
so  that 

(5.16) 


+  CC 


©1 


(u-y)  =  n  ^^0  (P^-*^(r,x)  +  Z?-*p)  . 


*=i 


Let  us  divide  this  product  in  three  parts: 


(5.17) 


©i(-j)  -  ((",  X-)  +  I?  V)] 


2-1 

HM’  (z?^-*(x,x)  +  Z)-*p) 

k=l 


[M^{ix.z)  +  D-^p)] 
Mj)  B{j)  C[j)  . 
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One  checks  easily  that  c<](7r,  tt)  0  and  thus,  for  j  large  enough  or  choosing  p  smaii  enov^h, 
we  have  C  <  C]  <  A{j)  <  1.  It  is  also  clear  that  for  1  <  k  <  j  -  I,  Mq  r  =  1  and 

that  for  f  >  1,  Mq(D‘-[t ,  r)A-o)>  1  -C|jo||  for  a  small  enough,  with  C  >  0.  Consequently, 
if  p  has  been  chosen  s-nall  enough.  1  >  B{j)  >  H/ii  (I  -  ^2"^|1p1|]  >  C2  >  0.  Finally  since 
(tt.  rr)  is  a  second  order  zero  of  the  third  factor  satisfies 

(5.18)  2-^C3|1p|1'  =  C3\\D-^pf  <  CU)  <  C,\\D-^p\\^  =  . 

This  shows  that  0i(wj)  behaves  like  2"-’  ~  when  j  goes  to  +oc  and  the  proposition 

is  proved.  • 


Note  that  from  tiie  decay  of  0\{lj)  we  cannot  even  conclude  that  it  belongs  to 
or  that  0n.  1  is  a  continuous  function.  Yet  bot,.  are  true;  we  are  going  to  prove  tliiS  by 
the  Littlewood-Paley  method  exposed  in  I1.4.a.  The  filler  Mq{u>)  and  the  scaling  function 
0)(u;)  are  particularly  well  adapted  for  this  approach  since  they  are  positive  so  that  the 
regularity  estimation  is  optimal  (because  ||Ay(0i  )||ioo  ~  ||A^(0)  )|li,) ;  see  §11. 4. a). 


Proposition  5.2  The  optimal  global  Holder  exponent  jor  <i>i{x)  is 


Proof: 

We  consider  the  transition  operator  defined  by 

(5.19)  TF{Du}  =  M^{^)F(u>)  +  M^(*^  +  (rr,  r ))  F(u.  +  (r,  tt))  . 

As  in  the  one  dimensional  case  T  can  be  studied  in  a  finite  dimeniional  space  but 
this  subspace  cannot  be  defined  as  simply  as  E{F:.  A’j)  in  (2.61).  One  way  of  finding  an 
invariant  subspace  is  to  apply  T  to  the  constan*  1  and  then  iterate  it  on  the  characters 
pi(*]u<j+i2u/5)  which  are  obtained  until  a  stable  set  is  attained.  With  Mq  corresponding  to 
(5.4),  this  subspace  is  trivial,  since  Tj  =  1.  Lemma  2.5  then  guarantees  the  integrability 
of  d>i,  hence  the  continuity  of  ©j.  To  estimate  the  Holder  exponent  of  ©:  we  need  a  larger 
subspace,  which  we  obtain  by  iterating  T  on  1  and  on  cosw]  -f-cos^2-  The  size  of  th“  matrix 
representing  the  action  of  T  on  this  subspace  can  be  seriously  reduced  by  exploiting  the 

svinmetries,  i.e.  the  invariance  under  toj tO) .  u>2 - ^^2  and  lv]-^l02. 

Using  the  subspace  E  g<.nerated  by  the  basis 

(5.20)  f]  =  1.  e2  =  cosu>] cos-^2-  €3  =  cos(u.’i  -r  wo) -f- cos(-^2  ~  ^'1 ) 


we  obtain  the  foUowing  matrix 


(5.21) 


T  = 


/  1  1  0  \ 
0  I  1 
\  0  i  0  / 


41 


AFIT/AFOSR  Wavelets  Workshop  89 


which  has  the  eigenvalues  |l,  The  two  last  eigenvalues  correspond  lo  the 

subspace  Eo  C  £  defined  by 

(5.22)  £o  =  {F(u.)  €  r.  FiO)  =  0}  . 

Similarly  to  the  one  dimensional  case,  we  iterate  T  on  the  positive  function  cj  -  which 
is  clearly  in  £o  and  this  leads  us  to 

(5.23)  ~  ~  • 

where  Aj/2(d>i)  is  the  Littlewood-Paley  block  corresponding  to  the  region 
([-tr,  situated  at  a  distance  2^^  of  the  origin.  Consequently, 

if  we  define 

(5.24)  o  =  log  =  0.61 

it  follows  from  (5.23)  that 

(5.25)  (l  +  lu;l)“  <^,(u)GI’(1;2)  and  d>,(i)  €  C“(K=)  . 

Consequently  <f>i  is  Holder  continuous  with  regularity  0.61.  ■ 

This  property  appears  in  the  graph  of  4>]  on  figure  10  (obtained  by  the  cascade  algorithm) 
which  presents  a  smooth  aspect  with  several  pointwise  cusps.  Note  that  this  regularity  is 
not  sufficient  to  derive  a  better  decay  of  ^i(u>)  than  Propositions  5.1  and  5.2  are 

thus  complementary. 

Remarks: 

•  Note  that,  since  we  have 

(5.26)  Mo'(a;)  +  M^(u;  +  (7r,7r))  =  1  , 

we  can  derive  the  convergence  of  the  truncated  products 

=  nj=i  Mo{D~^u)  X£)n([___.,,]2)(w)  with  the  same  method  as  in  the  orthonormal 
case  for  the  convergence  (Theorem  2.1).  This  leads  us  to  a  Poisson  summation 
formula 

(^•27)  ^  <I>j(u;+2fcr)  =  1 

fc€2* 

which  is  equivalent  to 

(5.2S)  <j>,(rzi, /12)  =  1  if  nj  =  722  =  0,  0  if  (t2.i ,  722)  €  Z^/{0)  . 

This  interpolating  property  of  (pi  has  been  noticed  in  approximation  theory  by  Deslau- 
rier  and  Dubuc  [DD].  It  explains  the  four  cusps  surrounding  the  center  at  the  points 
(0.1).  (1.0).  (O.-l)  and  (-1,0)  which  are  visible  on  figure  10.  However,  a  sharper 
analysis  shows  that  the  isolated  points  where  (i>i(i)  =  0  are  an  infinite  family. 
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Figure  10 

The  scaling  function  <^i(x) 


•  As  mentioned  in  section  Ill.S.b.  the  variable  r  =  ^(j/j  +  ^2)  can  be  replaced  by,  more 
generally,  c.\  =  Aj/j  -f  (1  -  A)j^2  with  A  6  [0. 1];  =  1  -  is  still  positive.  Let  us 

now  distinguish  the  dilation  matrices  R  and  5.  Then,  a  similar  analysis  in  the  case 
o{  D  =  R  leads  to  a  5  x  5  matrix  in  the  basis 


(f  1  •  «2- f3i  C-l' C5)  =  (l.COSu;].COSu)2<COS(u>i  +  U>2):COS{u>i  —  u;^)) 


(5.29)  Tx 


I  2  A  l-AOON 
0  1  -  A  A  0  2 

0  1  -  A  A  2  0 

0  A  0  0  0 

Vo  0  1  -  A  0  0  / 


and  numerical  computations  show  that  the  ‘‘isotropic  value”  A  =  ^  gives  the  highest 
index  of  regularity.  The  lowest  index  of  regularity  is  attained  for  A  =  0  or  1.  Note 
that  A  =  1  corresponds  to  the  convolution  product  g{x)  =  X'^  •  A'^  where  A  is  the 
twin  dragon  introduced  in  R'.S.a.  The  Holder  exponent  is  then  a  ~  0.47. 


•  To  estimate  the  decay  of  (=  (xaC*^’))^),  one  can  again  use  the  function  C(u.-)  of 
Proposition  5.1.  in  a  slightly  different  way.  Remark  that,  if  we  define 
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Giu)  =  1  -  2i  =  1  -  yi.  then 


C{u:)-G(u] 


( 1  -  yi  )y?  +  ( ^  -  y2  )yi  _  ^  j 

Vi  +  1/2 

>0  H  V,  >  W 

yi  *f  1/2 


and 


2C{lj)-G(lj) 


2lfl  -  yi)y2-f  (1  -  y2)yi] 

yi  +  y2 

(1  -  y)){y2  -  yi)  +  2yi(l  -  y2) 


yi  +  y2 


-  (1  -  yi) 

>0  if  ys  >  yi  ■ 


On  the  other  hand 

OC 

{5.30j  l5(u;)|  =  J]  GiR-'^ij)  ; 

k=\ 

to  majorate  ly(w)|  for  2^  <  lu;]  <  2^  we  only  need  to  majorate  the  j  first  factors  in 
(5.30).  Since  R  rotates  of  J,  half  of  the  factors  can  be  majorated  by  C{u)  and  the 
others  by  2C{u).  This  leads  to 


(5.31)  |y(w)l  <  02^""^  n  CiR-'^u;) 


and  thus 

(5.32)  y(u>)  <  C(1  +  M)-’  . 

It  is  easy  to  check  (in  a  similar  way  as  for  d>i(u;))  that  this  estimate  is  optimal.  An 
immediate  consequence  is  that  the  Fourier  transform  of  the  twin  dragon  characteristic 
function  satisfies 

(5.33)  X^(u>)<C(H- M)->/' 

which  was  not  obvious  since  we  did  not  have  a  formula  similar  to  (5.5)  for 

We  now  return  to  the  construction  of  our  biorthogonal  bases  and  attack  the  problem  of 
obtaining  isotropic  wavelet  bases  with  arbitrarily  high  regularity. 


V.2  Biorthogonal  wavelet  bases  with  arbitrarily  high  regularity 

We  now  consider  the  whole  family  of  filter  |m^  (u;),  by  (5.1)  and 

(5.2). 

A  first  remark  is  that  the  regularity  of  the  functions  <i»A'  increases  linearly  with  N .  More 
precisely,  since 

(5.34)  o/y(a:)  =  (*)''<i)i(x)  , 
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we  can  use  the  characterizalion  of  the  optimal  decay  exponent  for  established  in 

Proposition  (5.1)  to  estimate  the  regularity  index  o(A')  of  o>is.'(r).  This  leads  to 


and  thus 
(5.36) 


2A'  -  2  <  a(A’)  <  2A' 


Q(A’: 
iim  — — 

A' — hoc  A 


The  estimate  (5.36)  is  of  course  more  interesting  for  large  values  of  A’  than  for  small  values 
where  the  error  is  comparable  with  the  regularity. 

For  A’  =  1.  we  have  seen  that  o  :r  0.61. 

For  N  =  2,  the  Littlewood-Paley  approach  is  still  reasonable;  using  the  symmetries 
reduces  the  size  of  the  matrix  to  9  x  9.  Analyzing  the  eigenvalues,  one  find  that  4>2  is  in  C° 
with  o  '.i  2.93.  The  function  c>2  =  <P]  •  C>i  looks  very  smooth  indeed  on  Figure  13. 

For  A’  =  3,  the  matrix  becomes  too  large  to  tackle  by  hand.  In  all  cases  the  regularity 
of  the  wavelet  thA’.z,  will  of  course  be  the  same  as  that  of  <p^-.  The  problem  is  now  to  find 
the  appropriate  dual  function  for  the  analysis.  More  precisely  we  want  to  design  the  filter 


(5.37) 


by  choosing  the  number  L  in  such  way  that  the  hypothesis  of  Theorem  2.2  (in  its  bidimen- 
sional  generalization)  are  satisfied,  i.e.  that  we  have  at  least 


(5.38) 


f  >  0  . 


To  show  that  such  a  choice  is  possible  for  any  value  of  A’  (i.e.  for  an  arbitrarily  regular 
synthesis  function),  we  need  an  asymptotical  result  of  the  same  nature  as  Theorem  2.4. 
We  want  to  be  sure  that  the  regularizing  action  of  the  factor  (1  -  can  compensate  the 
inverse  effect  of  Pa'+L  if  L  is  large  enough. 

Using  a  similar  approach,  we  consider  the  simplest  fixed  point  of  a.’  < —  Du;  modulo  2-Z^, 
and  modulo  sign  changes  and  the  exchange  oftu>]  and  u>2.  This  fixed  point  is 
which  corresponds  to  =  -(t^o)  =  |- 

We  now  decompose  into  three  factors,  by  introducing  the  function  C(w)  defined 

by  (5.11): 


(5.39) 


=  lC(;^)]M0(-r  Pa'+l(-)  =  [C(u;))^  Pa-.:,(u;) 


<?(-)  =  = 

■C{u:) 

We  already  know  from  section  II. 3. b  that 


(yi  -f  y2){‘2  -  V)  -  1/7) 

2(yi  +  y?  -  2yiy2) 


(5.40) 


Pa'(--)<(4--)^’-’  if  . 


From  the  Bezout  relation  (3.30).  we  also  have 


Pa'(-)  < 
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Consequently,  we  can  roughly  majorale  PA’(r)  by 


(5.42)  P!s’(z)  <  min  max{42,2)^  if  r  €  [0,1]  . 

Defining  H{u))  =  min  max(4*,2))  and  (S{w)  =  H(u)Q(lj),  (5.39)  leads  us  to 

(5.43)  • 

We  are  now  facing  a  similar  situation  as  in  Theorem  2.4  where  we  had  shown  that  the 
function  ^(y)  =  max(2,4y)  =  /i(w)  satisfied 


(5.44) 


h(uj)  =  y(y)  <  (j)  y  <  5 

/i(u))/i(2u>)  =  y(y)y(4y(l  -  y))  <  jy  (f)]  if  f<y<l 


In  the  present  case,  although  we  do  not  dispose  of  any  simple  mathematical  proof,  numerical 
evidence  shows  that  we  have 


(5.45) 


and  similarly 


G(u))C?(Du))  <  [G(wo)P 
G{lj)G{Du)G{D^lj)  <  ((?(u;o)]= 


H{u;)H{Du)  < 


or  if  not, 


or  if  not, 


(5.46)  ^ 

^  H{u:)H{Du)H{D'^u>)  <  (^(wo)P 

These  two  statements  are  illustrated  respectively  in  Figures  11  and  12.  On  a)  and  b)  of 
each  of  these  figures  v.^e  have  plotted  the  functions  maoc  (F(w)F(Dw),  [F(u>o)]^)  -  [F(u>o)]^ 
and  raax(F(w)F(Dw),F(D^w);  (F(u;o)]^)  -  (F(wo)]^  for  F  =  C  and  H  (the  coordinates 
are  (yi,  yj)  €  [0,1]^).  On  c)  the  support  of  a)  and  b)  are  shown  to  be  disjoint  regions  in 

[o,ip. 

We  now  estimate y)/^^£,(u;).  From(5.39)  and  (5.43)  we  get 

+0O  ^ 


<  C(l  +  |u;|)- 


Bp.\l{D  *u;) 


.  3loc(  1  +  1.^ 


^  C(  1  +  |u^| 


n  C7(F-'u0 


1  <K  ^  ictci  +  li.’i) 
—  -.  log  2 


1 

L  -  -  “>»' 


Using  (5.45)  and  (5.46)  to  divide  these  products  in  groups  of  two  of  three  factors  which 
satisfy’  one  of  the  inequalities,  this  leads  to 


(5.47) 


^  C(1  +  lu;[) 


,  ?IloirfG(wR))  7S 

log  J  log  2 
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0 


a)  Graph  of yj)  =  max  (7/(u;)^f{Z)u)),  [7r(wo)]^)  - 


0 

b)  Graph  of  Hi{y\.  y^)  —  tcizs  {H (u)H [Du;)H [D^uj),  [7f(<^o)]  )  “  [•^f(‘*^o)] 


c)  Compared  supports  of  //i(yi,  y2)  a-nd  Hiivi,  3/2) 
Figure  12 

Graphic  proof  of  (5.46) 
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(5.47') 


C(l  +  P|) 


,2L(f.-l)+2A’fl 


o  =  =  0.907  and  13  = 


log(//(-;o)) 


~  1.322  . 


log  2  log  2 

Foriunately  a  <  1.  This  means  0l.a'(x)  can  be  made  arbitrarily  regular  by  choosing  L  large 
enough.  In  particular,  (5.38)  will  be  satisfied  if  we  have 


(5.48) 


21(0  -  ])  +  2;t'^  <  -]  . 


The  smallest  L  such  that  and  generate  unconditional  multiscale  bases  is  therefore 

given  asymptotically  by 


(5.49) 


L{N)  :r 


=  U.2K 


1-0  log  16  -  log  15 


This  asymptotical  estimate  is  moreover  optimal.  Indeed  define  u»j  =  t)- 

Because  of  the  fixed  point  property  of  u;o,  we  clearly  have 


(5.50) 


(5.51) 


^  (wj)  ~  C  -  C|u.'j 

-2  log 


From  the  definition  (5.2)  of  Mq  '^,  we  get 


= 


■Pa’+L 


3\^  /2(A’-M-  1) 


N  ~l-l 


N  +  L-1 


L  /  r  \  A  -fZ/ 


^  vie;  U; 


and  thus 


7  <  C  +  21  -  2N 

log  2  log  2 

=  C  +  2L(1  -a)-2NS  . 

It  follows  that  the  estimate  (5.49).  if  true,  is  certainly  optimal.  While  we  expect  (5.45). 
(5.46).  hence  (5.49).  to  be  true,  we  have  unfortunately  no  rigorous  proof.  However,  we  can 
prove  inequalities  which  are  slightlv  less  strong  than  (5.45),  (5.46).  leading  to  a  non-optimal 


Mil)  Mil 
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but  rigorous  eslimale  for  LiN).  More  precisely,  we  can  prove  that  fi 
split  up  as  fl  =  flj  U  n2  U  fls.  with 


(5.52) 


'  G(w)  <  ^  u>  e  fij 

<  G(uj)  G{D^)  < 

.  GCw)  GiDu;)  G{D^ij)  <  e  fl3 


[-T.  tt]^  can  be 


with  i/2  .9588  <  1,  resulting  in  (5.47')  with 

o  =  ~  .93982  . 

log  2 

If  we  use  the  crude  estimate  H(u))  <  4  for  all  uj  €  [-tt,  r]^,  corresponding  to  =  2,  then 
this  leads  to 

L{N)^  ~  32.959  N  ; 

1  —  Q 

this  factor  is  about  twice  as  large  as  in  (5.49).  The  detailed  proof  of  this  estimate  is  in 
Appendix  C. 

All  these  results  can  be  summarized  in  the  following  theorem: 

Theorem  5.3  The  family  of  dual  filters  I M^'^(w)|  ^  generates  biorihogonal 
bases  of  compactly  supported  wavelets  with  arbitrarily  high  regularity.  For  large  values  of 
N,  the  Holder  exponent  of  d>/\f(x)  is  equivalent  to  2N  and  the  minimal  choice  for  L  is 
asymptotically  proportional  to  A* , 


(5.53)  LiN)  ICN  , 

with  14.215  <fC<  32.959. 

Here  the  upper  bound  on  K.  is  not  tight,  and  we  expect  fC  =  14.215  to  hold,  as  indicated 
above. 


Remark: 

By  taking  L  larger  than  L{N),  can  also  be  made  arbitrarily  regular.  However,  in 
many  applications  such  as  coding,  approximation,  data  storage  and  compression,  we  do 
not  really  care  about  the  regularity  of  the  analyzing  functions  and  only  the  synthesis 
function  tb  and  6  have  to  be  smooth  since  this  property  is  important  for  the  cascade- 
reconstruction  algorithm.  This  justifies  the  choice  of  the  minimal  value  LiN)  such  that 
the  families  ^2^^^  -  k)|  and  -  /:)j  r  uncon¬ 

ditional  dual  bases  of  X^(F.).  Recall  that  the  existence  of  frame  bounds  is  essential  for  the 
stability  of  the  subband  coding  scheme. 

We  end  this  section  by  taking  a  closer  look  at  the  size  of  these  dual  filters. 
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V.3  Size  and  optimal  implementation  of  the  dual  filters 

The  asymptotical  ratio  2:  14.2  is  big  in  the  sense  that  the  filter  may  have  a 

very  large  number  of  taps.  More  precisely,  a  polynomial  P(:)  of  degree  p  corresponds  to  a 
filler  with  +  {p  +  if  nonzero  coefficients.  For  example,  if  N  -  3, 

is  according  to  (5.49)  a  polynomial  of  degree  p  =  A’  +  2Z-(A')  ~  87  in  r.  Consequently  it  is 
the  transfer  function  of  a  filter  with  approximately  1350  taps! 

It  seems  thus  that  the  dual  filter  is.  even  for  small  values  of  A',  much  too  large  for  a 
realistic  implementation.  This  is  not  quite  true  for  several  reasons. 

First,  one  can  factorize  the  polynomial  Ps+l(W)(z)  and  express  as  a  product  of 

p  monomials  in  r.  By  applying  successively  these  monomial  fillers  instead  of  using  directly 
their  product,  the  number  of  multiplications  per  sample  in  the  filtering  process  is  reduced 
from  order  p*  to  p.  Note  that  this  complexity  reduction  associated  with  the  factorization 
is  due  to  the  multidimensional  situation  and  does  not  occur  in  the  ID  case. 

Second,  the  filler  corresponding  to  the  variable  r,  i.e.  the  laplacian  discrete  scheme,  has 
coefficients  cq.o  =  ^  and  ci.o  =  c_i.o  =  cq.)  =  co,_]  =  ~|'  can  thus  be  implemented 
by  using  binary  shifts  instead  of  multipbcations.  This  is  very  important  since  a  binary 
shift  is  usually  performed  10  times  faster  than  an  addition  and  100  limes  faster  than  a 
multiplication  in  most  processors.  This  shows  that  only  the  additions  count  here.  If  i  is  the 
time  for  one  multiplication,  each  monomial  filler  will  generate  one  sample  in  approximately 
y  and  the  same  operation  will  take  ^  for  the  whole  filter.  For  A’  =  3  and  p  =  87,  this 
corresponds  to  the  complexity  of  a  52  tap  filter  which  is  much  more  reasonable  than  the 
first  estimation. 

Finally,  for  small  values  of  A\  it  is  clear  that  the  asymptotical  estimate  (5.49)  of  L{N) 
is  far  from  sharp,  just  as,  in  IZI,  the  asymptotical  estimate  on  regularity  of  section  II. 3. b 
was  ill-suited  to  small  filters. 

A  better  estimate  for  L{N)  can  be  found  by  checking  that  the  optimal  decay  exponent 
ioTo{uj)  is  exactly  determined  by  the  value  of  at  u>o  =  ^  j.  More  precisely  recall 

that  we  have 

=  [C(u)]^  . 

For  the  small  values  of  N  and  L  considered  below,  one  can  check  by  the  same  graphical 
arguments  that  the  inequalities  (5.45)  or  (5.46)  are  also  satisfied  by  i.e. 


(5.54) 


f  Bs,l{u)  Bp’,i{Du})  <  (J9a-x(i^o)]^  or  If  tiot, 
I  Bs,i(ui)  Bs,l\Dvj)  Bf,;_i[D'^)  <  [5a-.:,(wo)]^  • 


In  order  for  (5.38)  to  be  satisfied,  we  therefore  only  need 
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Figure  13 

The  scaling  function  07  {=  •  <i>i) 
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b)  O] 

Figure  14 

Aiialvsis  and  synthesis  scaling  function  for  A’  =  1  a-nd  1  =  2 
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and  this  will  be  sufficient  for  these  small  values  of  A'  and  L  for  which  (5.52)  holds.  Using 
the  definition  (5.2)  of  we  obtain: 

•  for  A'  =  1 .  L(  1 )  =  3 

.  for  A’  =  2,  1(2)  =  12 

•  for  A’ =  3,  L(3)=22 

Clearly  these  estimates  are  much  better  than  (5.49).  Finally,  L{h')  can  be  even  more 
reduced,  for  small  values  of  A’,  if  an  even  sharper  criterion  that  the  frequency  deca\'  (5.38) 
is  used  to  ensure  the  existence  of  frame  bounds.  We  show  indeed  in  Appendix  A  that  the 
spectral  analysis  of  the  transition  operators  To  and  %  corresponding  to  the  functions 
and  can  be  used  to  derive  both  the  frame  property  and  the  convergence  needed 

to  .i-ave  a  pair  of  dual  Riesz  bases.  In  (CD)  we  prove  that  this  criterion  is  sharp  so  that  the 
value  of  i-(A')  is  here  optimal.  Unfortunately  the  matrices  of  To  and  I'd  can  be  very  big, 
even  for  small  A’  and  L. 

For  A'  =  1,  we  now  obtain  L(A’)  =  2  so  that  the  two  filters  Ml  and  Ml"^  are  of  small 
size.  We  show  on  figure  14  and  15  the  scaling  functions  and  wavelets  obtained  from  such  a 
choice. 
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Appendix  A:  A  sharp  criterion  for  frame  bounds 

We  want  to  give  here  a  belter  result  than  Theorem  2.2  to  characterize  the  dual  filter  pair 
(mo,  mo)  which  lead  to  biorthogonal  Rjesz  bases  of  wavelets.  The  method  that  we  show 
here  uses  the  transition  operators  associated  to  the  positive  functions  |moP  and  |mo|^  (see 
Section  I1.4.a). 

First,  recall  that  the  'F,  V'  and  v>  are  defined  by 


(A.l) 


=  J]  mo(2"‘u))  ,  tl>{2^)  =  mi(u;)  ^{u>) 


As  mentioned  in  Theorem  2.2  the  duality  relations  (2.19),  (2.20)  and  the  decomposition  for¬ 
mula  (2.21)  are  ensured  as  soon  as  the  partial  products 
'i^nf**’)  =  11^=1  mo(2~*u)  X'l-7''r.7''ir](‘^)  ^-Ud  <^„(u))  =  0*=!  mo(2~*tj)  Xl5''r,?''r)(‘*')  tOt'- 
verge  in  X^(E)  respectively  to  i,?(tj)  andt,j>(u)). 

The  main  difficulty  is  then  to  obtain  the  frame  bounds  A,  A  and  B  all  stricth' 
positive  such  that  for  all  /  in  i^(K), 

f  A  ii/ip  <  x:  ^  ii/ii' 

}MZ 

(A.2) 

/i  ll/ll'  <  E  5  ■ 

.  iMl 

It  is  sufficient  to  obtain  the  two  upper  bounds  of  (A. 2)  because  the  lower  bounds  are 
then  obtained  by  using  (2.21)  and  the  Schwarz  inequality  which  give 

(A.3)  ii/ii'<  • 

In  [CDF]  we  used  the  following  assumption 


(.A. 4)  l^(t^')i^  -i-  ^ 

which  can  also  be  formulated  with  ip  and  tp  instead  of  li  and  v>.  Here,  we  shall  prove 
the  convergence  of  {pn-  '3Ti)r.>o  and  the  frame  inequalities  (A.2)  using  weaker  assump¬ 
tions.  More  precisely  let  To  and  To  be  the  two  transition  operators  associated  to  the 
functions  [mol^  and  jmoP,  as  defined  in  Section  II. 4. a.  They  both  operate  in  two  spaces  of 
trigonometric  polynomials  and  £^,.  We  have  proved  in  Lemma  2.2  that  the  subspaces 
Fs  =  {/  €  £a'!/(0)  =  0}  and  Ffj  =  {/  €  B^\/(0)  =  0}  are  invariant  under  the  action 
of  To  and  to-  The  foUowing  result  gives  a  sharp  characterization  of  the  dual  filter  pairs 
associated  to  biorthogonal  wavelet  bases. 

Theorem  A.3  Let  A  fresp.  XJ  be  ihe  largest  eigenvalue  of  To  (resp.  Tq)  in  the  subspace 
Fs  (resp.  Ff^,).  Then  if  |A|  and  |A|  ore  both  strictly  inferior  to  1,  the  functions  ip  and  ip 
defined  by  (A.l)  generates  biorthogonal  Rtes:  bases  of  wavelets  {i/']., 
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Proof: 

We  shall  prove  here  that  this  condition  on  the  eigenvalues  of  To  and  To  is  sufficient  to  obtain 
biorthogonal  wavelet  bases,  in  fact,  it  is  also  a  necessary  condition.  This  result  is  detailed 
in  ICD). 

We  first  show  that  and  v  are  in  As  in  Theorem  2.i.  we  apply  to  the 

function  Ci(to)  =  1  -  cos^’  which  is  in  Fs  and  by  using  Lemma  2.5,  we  obtain 

/  < 

< 

< 


C  / 

■  J-^^r 


c 


*)  +  3 


because  >  7.  Since  we  also  have  <  1,  it  follows  that  the  dyadic  blocks  in  the 
Littlewood-Paley  decomposition  of  satisfy  the  inequality 


(.A. 5)  \\-^}i‘P)\\L^  <  C2~‘-'  for  some  c  >  0  . 

This  proves  that  <,5  and  ti  are  even  better  than  L^:  They  belong  to  a  Besov  space  3^’°° 
(C  T^(K))  for  some  c  >  0.  We  shall  use  this  property  to  prove  the  frame  inequalities. 
Similarly  and  belong  to  J?j’“  for  some  <  >  0.  To  prove  the  convergence  of  the 
sequence  t^n  to  (f,  we  remark  that  since  T7to(0)  =  1,  for  a  in  )0,  tt]  small  enough  we  have 


(A.6) 


|a;|  <  Q  =>  >  C  >  0  . 


We  now  introduce  the  sequence  defined  by 

n 

(A.7)  =  n  ”^o(2~'''uJ)X(_2’'a.2"o)(‘^)  • 

*=1 


It  is  clear  that  converges  pointwise  to  but  (A.6)  also  implies  <  1--^- 

for  aU  Ti  >  0.  By  the  Lebesgue  dominated  convergence  theorem  we  get 

(A. 8)  lim  =  0  . 

— oc 

We  now  use  the  hypothesis  on  the  eigenvalues  to  evaluate  the  norm  of  the  difference 
-  v'“ 

/  |si„(u;)  - 

Ci(o) 

<  C2—  0  . 

Consequently  converges  to  in  L^{V,)  and  the  same  holds  for  i2„  and  ip. 
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It  remains  lo  establish  the  upper  frame  inequalities  in  (A.l).  We  shah  obtain  them  by 
using  the  following  lemma. 

Lemma  A. 2  Let  v  be  o  Junction  w  1^(1.)  such  that  for  some  r  >  0, 

-i-  2kr  <  C] 
kez 

}€Z 

uniformly  in  uj.  Dejint,  for  j,  k  in  L,  Then,  for  all  f  in  L^{Tk), 

(A.ii)  YL  K/lt/-i)l'<c,C2ii/iP . 

j.fcez 

Let  us  first  assume  that  this  result  is  true  to  conclude  the  proof  of  the  theorem.  We  thus 
have  to  check  that  there  exist  a  o  >  0  such  that  (A.9)  and  (A. 10)  are  satisfied  for  tt>  and  r. 


(A.9j 

(A.10) 


To  check  (A.IO),  we  define  =  [-2^+^r,  -2->7r]  U  |2->7r,  2^+’::).  For  j  <  1,  we  can  use 
the  cancellation  of  at  the  origin  to  obtain 

(A.12)  max  |ti(i*>)|  <  C2^  for  ;'  <  1  . 


For  j  >  2,  we  know  that  xi){±2H)  =  0  since  ip{2kTt)  -  0  for  €  Z  \  {0}.  We  thus  have 

d 


max  itb(u>)p  < 

w€ij 


-i-]  f  |£(“) 


<  2 


dij 


1/2 


The  first  factor  can  be  raajorated  by  2”'-’  because  we  have  proved  that  ib  belongs  to 
The  second  factor  is  finite  since  it  is  proportional  to  J  \xib{x)\^  dx  and  t/?  is  a  compactly 
supported  L~  function  and  consequently 

(A. 13)  max  |t^(u.')|  <  C2"  j-’  for  j  >  2  . 

w€  Jj 


Combining  (A.12)  and  (A. 13)  we  see  that  (A.IO)  holds  for  all  o  >  0,  since  we  have 


;€£ 


< 


Y\  max|Ti>(u>)l 


< 

< 


C 


L  2'' 


'<1 


^  y:  2-7- 


C2(cr)  . 
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We  now  check  that  (A. 9)  is  satisfied  for  some  rr  >  0.  Because  the  wavelet  satisfies 
iti)(4kz)  =  0  for  all  A-  €  Z.  we  can  derive 


+  /  -(Itf'P 

<  |2  -  trl^  |ti>(u>)|’“ 


{Lj 


du) 


drh 


W;. ) 
Ti/2 


du> 


r  /•  -  V'^  r 

<  |2  -  cri 


t/V 

(L; 


Tl/2 


<Lj 


We  already  saw  that  the  second  factor  was  finite  (in  the  proof  of  (A. 10)).  The  first  factor  is 
also  finite  for  a  small  enough.  Indeed,  using  V’  €  and  the  Holder  inequaLty,  we  obtain 


/  <  /  )v>(w)p  cU 

Jl,  [4  j 


W’e  thus  have  to  choose  cr  >  0  such  that  o  —  2€(1  -  a)  <  0,  i.e.  a  <  Since  the  same 
results  also  hold  for  the  dual  wavelet  ti,  the  theorem  is  proved  modulo  the  Lemma  A.2  that 
we  now  tackle.  Using  the  Plancherel  and  the  Poisson  formulas,  we  derive  for  any  /  in  X^(K) 

kez  kez 

k€Z  *• 

]  rir 


J 

=  — 

2-  h 


/(2"-’(u.-  +  2£-))  -f  2A-) 
rei 

<  |,i;(u.+  2£7r)lf  |;h(u.S-2£^  cL; 

^  I7  //  ^^^!/(2-^(u;-f  2£-))ni^(u.  +  2£-)rj  +  2£-)|’- 


(Zw 


9-^ 


<  C,  - 


l/(2-^u;)p  iT^(u;)r 


2- 
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Summing  on  all  il»e  scale?  j  €  I  and  using  (A.  10),  we  get 


(A.ll) 


C,C,  ll/ll’ 


and  this  concludes  the  proof.  ■ 


62 


AFIT/AFOSR  Wavelets  Workshop 


Appendix  B.  Dragonic  expansions. 

In  this  Appendix  we  want  lo  show  how  the  one-dimensional  lechniques'in  [DL]  can  be 
extended  to  multidimensional  situations.  As  an  example  we  discuss  the  two-dimensional 

case,  with  the  dilation  matrix  R  = 

A  first  multidimensional  ex-tension  of  [DL]  can  be  found  in  [Mo].  Even  though  be  looks 
at  general  matrices,  Mongeau  effectively  reduces  his  analysis  to  pure  dilations  by  considering 
the  smallest  n  such  that  ^  =  Z?"  is  a  multiple  of  the  identity,  and  rewriting  (by  iteration) 
the  two-scale  equation  so  that  it  involves  only  D.  This  procedure  can  drastically  increase 
the  number  of  different  terms  in  the  equation.  We  choose  here  to  work  directly  with  D  =  R 
itself. 

When  the  two-scale  equation  is  one-dimensional,  and  the  dilation  factor  is  2,  the  regu¬ 
larity  at  X  of  the  function  tp  solving  the  equation  is  regulated  by  the  binary  expansion  of  i 
(for  dilation  factor  p,  the  same  role  is  played  by  the  p-ary  expansion).  Moreover,  IF.  and  in 
particular  supp(<j!>)  is  tiled  with  integer  translates  of  the  interval  |0, 1),  which  can  be  viewed 
as  the  set  of  numbers  equal  to  the  decimal  part  only  of  their  dyadic  expansion;  if  N  such 
tiles  are  needed  to  cover  support  then  the  two-scale  functional  equation  can  be  rewritten 
as  an  equation  for  an  A'-dimensional  vector-valued  function  involving  two  matrices  To  and 
Tj.  The  spectral  properties  of  To,  Tj  then  determine  the  regularity  of  d>,  both  local  and 
global  [DL]. 

In  the  two-dimensional  case  with  dilation  matrix  ii,  the  role  of  elementary  tile  is  now 
played  by  the  twin  dragon  set  A.  It  is  defined  by 

(B.l)  A  =  X  =  where  pj  €  I  =  Z^/RZ^  =  {(0,0),  (1,0)) 

Under  the  standard  identification  of  with  C,  with  (x,  y)  ~  i  -i-  iy,  A  can  also  be  written 
as 

(B.2)  A  =  |r  €  C;  r  =  '^dj  where  dj  =  0  or  1 

This  set  A  is  compact,  has  fractal  boundar.v,  is  selfsimilar,  and  its  Z^-lranslates  tile  the 
plane.  The  indicator  function  of  A  is  the  solution  to  the  two-scale  equation 

o'l)  =  d>{Rx)  -f  0{R=  -  (1,0)) 

(see  [GM]).  A  is  called  the  twin  dragon  set  (K).  We  shall  give  the  name  dragonic  expansions 
to  expansions  of  i  or  r  as  in  (B.l),  (B.3j.  Note  that  (as  in  the  binary  case)  some  points 

may  have  two  different  dragonic  expansions,  e.g.  .01000...=  =  ^  =  .1011111... 

(This  example  also  illustrates  that  addition  follows  rules  very  different  from  the  binary  case, 

since  .0100  . . .  -f-  .0100  . . .  =  .1111 _ ) 

Suppose  we  are  interested  in  various  regularity  properties  of  L’ -solutions  ©  of 
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where  A  is  a  finite  subset  of  Z'.  Such  solutions  are  uniquelv  defined  up  to  normalization 
and  have  necessarily  compact  support.  One  can  determine  the  minimal  set  F  C  so  that 
R~^(T  +  A  -  I)  C  F:  then  support  a>  C  +  0-  The  equation  (B.3)  for  <t>  can  be 

rewritten  by  defining  the  |F|-dimensional  vector  v(z)  by 

(B.4)  Vj(x]  =  <b(2-  +  >)  jtF.  2-€A; 

we  have 

k 

where  is  t-he  first  digit  in  the  dragonic  expansion  of  x  , 

and  rx  is  the  point  obtained  by  dropping  dy{x)  from  the 

(same)  dragonic  expansion  of  i,  rx  =  • 

Equation  (B.4)  can  be  recast  as 

(B-5)  u(i)  =  v(ri)  , 

(Tb)ji*  =  {Ti)jk  -  Cflj_*+(i.o)  . 

We  have  completed  a  setup  analogous  to  that  of  [DL).  The  question  is  now  whether  the 
proof  techniques  of  [DL]  still  work  in  this  case.  The  answer  is  basically  yes.  For  instance, 
we  still  have 

Theorem  B.3  y4ssume  that  the  c*  in  (B.3)  satisfy 

y  c/j„x(i.o)  —  1  • 

n  n 

Then  Cj  =  (1,  is  o  common  lefteigenvector  of  Tq,  T:  with  eigenvalve  ]  for  both 

matrices.  Define  Ei  to  be  the  one-dimensional  siibspace  orthogonal  to  Ci.  If  there  exist 
A  <  1,  C  >  0  so  that 

(B-6)  lir,,...r,„i£,ii<CA"> 

for  all  possible  dj  =  0  or  ],  all  m  €  P«',  then  the  -solution  d  to  (B.3)  is  Holder  continuous 
with  exponent  o  =  ( log  Aj/iog%/2. 

This  is  the  analog  of  Theorem  2.3  in  [DL].  Two  different  strategies  of  proof  are  given  in 
[DL].  The  first  one  involves  piecewise  linear  spline  approximants;  this  technique  would  be 
hard  to  generalize  here  because  of  the  fractal  boundary  of  our  domain  building  blocks  A-rk. 
A  second  strategy,  which  does  not  use  spbnes  at  all.  but  leads  to  longer  proofs,  is  explained 
in  the  Appendix  in  [DLj:  this  strategy  generalizes  to  the  present  case.  The  main  point 
we  have  to  check  to  make  sure  the  proof  carries  over  is  whether  elements  that  are  close 
necessarily  have  dragonic  expansions  with  the  same  starting  digits.  In  the  one-dimensional, 
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binary  case,  if  two  dyadic  ralionaJs  x,y  are  closer  than  2“*".  ji  -  yl  <  2""'.  then  t  and  y 
have  binary  exp.Mtsions  with  coinciding  first  m  digits.  (If  e.g.  x  <  y  <  i  ■+  2"”’.  then  the 
expansion  “from  above’’  of  x  —  ending  in  al!  zeros  —  has  the  same  first  m  digits  as  the 
expansion  “from  below”  of  y  —  ending  in  all  ones.)  This  is  crucial  in  the  proof,  and  allows 
to  extract  Holder  continuity  from  the  condition  (B.6).  We  therefore  have  to  check  wlietlier 
a  similar  property  holds  in  the  “dragonic  case. 

Bv  analog}'  we  shall  call  dragomc  rationals  all  the  points  in  A  for  which  a  terminating 
dragonic  expansion  can  be  written.  Typically  dragonic  rationals  also  have  other,  non¬ 
terminating  dragonic  expansions.  For  each  dragonic  rational  x  the  terminating  expansion 
is  unique;  we  denote  its  digits  by  d^ix),  j  €  H- 

Let  us  also  introduce  the  notations  Rq,  Riy 

Roy  =  Ry,  R\y  =  Ry  A  (1.0)  , 

or  Rjy  =  Ry  A  d(l.O),  with  d  =  0  or  1  . 

Take  now  a  fixed  dragonic  rational  i,  and  assume  that  d®(i)  =  0  for  j  >  J.  All  the 
y  €  ^  that  have  the  same  first  J  digits  d^fx),  j  <  J,  constitute  a  little  dragon  Aj{x) 
themselves, 

Ajix)  =  A  ; 

X  itself  is  the  image  of  (0,0)  under  the  same  map  ^ 

little  dragons  of  the  same  size  as  A j,  all  translates  of  Aj.  For  every  such  little  dragon,  we 
call  the  point  corresponding  to  (0,0)  the  “bottom”,  and  the  point  corresponding  to  (0.1) 
(the  only  other  point  in  Z^DA)  the  “lop”.  If  r  is  a  dragonic  rational  with  at  most  N  nonzero 
digits,  then  x  is  the  bottom  of  Aj(x)  for  all  J  >  A.  (But  note  that  the  “orientation”  of 
A^(i),  as  indicated  by  the  line  connecting  bottom  and  top,  changes  with  J!).  It  follows 
that  X  is  on  the  border  of  these  Ajfz).  If  x  is  not  at  the  edge  of  A  itself,  then  there  must 
exist  another  little  dragon  Aj(y)  so  that  x  is  the  top  of  Aj(y)  (since  A  is  the  union  of  all 
the  2"^  possible  dragons  Aj).  Since  the  top  (0,1)  of  A  is  given  by  the  expansion  .111111  .. ., 
we  can  therefore  find  another  dragonic  expansion  for  x,  ending  in  all  one's,  and  with  the 
same  J  first  digits  as  y, 

dj(x)  =  d°(y)  for  j  <  J,  d](j:)  =  1  for  ;  >  J  . 

W'e  have  seen  how  to  obtain  the  two  expansions  for  a  dragonic  rational  x.  We  now  wa.nt 
to  show  that  if  another  dragonic  rational  y  is  “close”  to  x,  then  at  least  one  of  its  expansions 
starts  with  the  same  digits  as  one  of  the  expansions  for  x.  Define 

p  =  max  {r;  B((0,0);  r)  C  A  U  (A  -  (0, 1)))  , 

where  B[y\  A)  is  the  open  Euclidean  ball  centered  at  y  with  radius  A.  Suppose  i  is  a  dragonic 
rational  with  (f°(z)  =  0  for  j  >  J.  Take  m  >  J ,  and  consider  the  set 

Brr.  =  {y€  A;  |y-i|<p2-’"^'}  . 
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There  are  two  possibiliiies:  either  2  is  on  the  border  BA  of  A.  or  it  isn’t.  If  j  €  then 


(A  -  (0.1)) 


has  no  common  interior  points  with  A.  so  that  Bm  C  A„(r).  and  tlie  terminating  expan¬ 
sions  of  aU  y  €  Brr,  has  the  same  first  m  digits  £f^(z),  j  =  1 . m.  If  r  f  BA.  then 

R~J  .  R~^  .  (A  -  (0.1))  C  A:  this  set  is  then  a  little  dragon  Am(z)  of  which  x  is  the 
top.  In  this  case  C  A„(z)U  Am(z).  so  that  every  point  j/  €  £„  has  a  dragonic  expan¬ 
sion  with  either  the  same  first  m  digits  as  c!^(x)  (if  y  €  Am(x))  or  as  d\x)  (if  y  €  A,„(r)). 
This  is  the  main  ingredient  needed  to  make  the  proof  of  Theorem  2.3,  as  sketched  in  the 
Appendix  in  [DL],  work  in  the  present  case. 

One  other  point  that  needs  checking  is  whether  the  existence  of  two  different  dragonic 
expansions  for  x  doesn’t  lead  to  inconsistencies  for  the  definition  of  v(i).  If  £f®(z)  =  0  for 
j  >  J,  then  d^(x].  d^{x)  are  linked  by 


A'  A’ 

‘  =  Z  ^(i')^“'(1.0)  =  y  d](x)R-^(U0)  -f  R-'^'iOA) 

for  N  >  J  arbitrary.  One  can  then  compute  v(x}  in  two  wavs,  using  the  two  expansions. 
The  following  computation  shows  that  they  lead  to  the  same  result;  for  /:  €  F, 

jy—2n 

mj  ...m/i* 

TH)  ...IT) 

=  |^c;j(x)  •  • 

The  reader  can  now  check  that  the  proof  in  |DL]  indeed  carries  over  to  prove  Theo- 
reiTi  B.l.  Similariy.  one  can  prove  differentiability  of  o  under  stronger  conditions  on  To-  Tj. 
similar  to  Theorem  3.1  in  [DLI.  Finally,  the  same  techniques  can  also  be  used  for  local 
regularity'  estimates,  but  these  are  a  bit  more  trick}',  and  require  further  study  of  the  prop¬ 
erties  of  dragonic  expansions.  In  practice,  the  matrices  Tol£, .  Til^,  are  often  too  large  to 
permit  a  rigorous  estimate  of  ><  in  (B.6).  However,  A  is  bounded  below  by  the  quantities 
p[Ti^  . .  .Ta„,  and  this  leads  to  upper  bounds  for  the  Holder  exponent  a. 


Examples. 

1.  g{x)  =  I  g{Rx  ^  (1.0))  -  g{Rx)  4  i  g[Rx  -  (1.0)) 

The  solution  to  this  equation  is  the  convolution  where  XCi  is  the  indicator 
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function  of  the  dragon  set  A  (see  also  the  second  remark  following  Proposition  6.2). 
In  this  case  F  has  10  elements.  The  largest  spectral  radius  of  TdlL^.  it^  obtained  for 
d  -  0.  piToiz,)  -  corresponding  to  a  lower  bound  A  >  piTolE^)  in 

(  ).  or  a  Holder  exponent  o  <  .47637  ....  Via  other  methods  (using  the  transition 

operator  T  of  (6.19))  one  also  derives  that  this  value  is  a  lower  bound.  This  global 
Holder  exponent  is  attained  in  dragouic  rationals.  in  particular  in  (0.0). 

Note  that  when  Mq  is  positive,  as  in  this  case,  the  transition  operator  T  is  already 
known  to  give  optimal  results.  One  easily  checks  that  the  matrix  representing  T  is  in 
fact  a  submatrix  of  To,  so  that  it  is  not  surprising  that  they  have  a  common  eigenvalue! 

2.  d)(i)  =  Ilo0{R^)  +  /i,(P(A2  -  (1,0))  +  /i2d)(Ai  -  (-1.1))  +  /i3d.(it2r  -  (0,  D)  , 
with  ho  =  .506970418225.  h,  =  -.207072424345,  //:  =  .493029581775. 

//3  =  1.20707242435.  This  is  an  example  from  the  family  described  at  the  very  end 
of  5111. 3. a.  It  leads  to  an  orthonormal  wavelet  basis.  In  this  case  IT)  =  14:  the 
parameters  have  been  chosen  so  that  p{To\Ei  -  )  -  .714.  Plots  of  approx¬ 

imations  to  o  seem  to  suggest  that  cj  might  be  continuous,  but  we  have  no  proof. 
If  it  is.  then  its  Holder  exponent  is  bounded  above  by  logjp^JoTi  If;,  )’''^]/ log  \/2  ~ 
log(.90649)/log\/2  ~  .28327. 
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Appendix  C.  Proof  of  the  inequalities  (5.52)  for  G(-^). 


The  function  G  is  defined  as 

G(u>)  =  cos=-~  +  C0E^~  Sin^—  +  sin  y  sin  — ^ — 


2  ~  -^2 


Hiu)  =  h 


1  /  .  2 

-  Bin  — 

2  V  2 


•  2  ^2 
+  6in’-;i 


h{i)  = 


0  <  t  <  112 


41  1/2  <  1  <  1 


We  want  to  prove  inequalities  for  Clu;),  G(u>)GiDu>)  and  G{u>)G(Dijj)G{D‘^u>},  where 
D{u)\,u)i)  is  either  (w]  +  u>2,  u)\  —  <*>2)  or  (u>)  -  u>2,  i*')  -f  u>2).  (Since  C  is  invariant  for 
the  interchange  of  u>\.u)ii  it  does  not  matter  which  definition  of  D  is  taken,  D  —  R  oi 
D  =  S.)  To  prove  these  inequalities  it  is  convenient  to  use  different  variables, 


s  =  s(u;)  =  -  (^sin*y +  sin^yj  ,  p  =  =  sin'y  sin  y. 


We  then  have 


=  t^lzAh[s)  =  2r?(s,p) 
s  -  p 


Moreover, 

s(i)u;)  =  2(s-p),  p(Pu;)  =  4(s’ -  p)  . 

As  u  ranges  over  [— rr,  -],  (s,  p)  fill  out  the  domain  h  defined  by 

A  =  {(s,p);  0  <  s  <  1,  max(0, 25  -  1)  <  p  <  . 

In  terms  of  these  new  variables,  we  therefore  want  to  study  77(5, p),  ’7(5, p)  Ti{D{s,p))  and 
77(4, pj  ri(D{s,p))  77(£)*(4,p)),  for  all  (s.p)  €  A,  where  D  is  defined  by 

I?(s,p)  =  (3,p)  =  (2(5 -p),  4(5^ -p)). 

Note  that  1)  maps  A  twice  onto  itself  (both  A  n  {5  <  1/2}  and  A  n  {5  >  3  /2}  get  mapped 
to  aU  of  A).  Moreover  £>  has  one  fixed  point,  (so.po)  =  (|^  corresponding  to 

77(^0,  ;>o)  =  j|- 


We  shall  prove  that  A  =  A:  U  A2  U  A3,  where 


(C.l) 


nis^p}<(  on  A] 


77(5.p)  77(jC?(s,p))  <  on  A; 
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The  v^ue  o(  <  wil.  he  h«d  bv  oer  e«™.^es  helo-  o.r  ,o.,  U  >o  oU«.  C  <  1. 
Choose  o  =  %/5,  ind  defi"'  ‘’■ 

A.  .  (u,h)€A: 

.<  ,-i  ,>(!-*)  if  -'S  • 

^  -  G  •* 

25^(3  -  s) 


Since 


if  s  >  1/2  , 


=  2(7^ 

we  auiomaiically  have 

T,(i,p)  <  o  on  A)  . 

B,  Che  dehnicio..  of  ,  and  i),  we  have  Co  dUcinga'.sh  fonc  difecen.  regions  when  scndyh,, 
T?2(i,P)  =  r](^^vHi>{s,p)): 


(C.4) 


T)2(S>P)  = 


2(i:-p)  "  4(5 -25^  +  ?) 
25s(1  -  •s) 


s-  p 


if  £  <  1/2,  V<s-  1/4 
if  £  <  1/2,  P  >  5  -  1/4 
if  £  >  1/2,  P  >  w'  -  1/4 


£2(1-5)  _  £^(1  -  •^). 

5  -  p  “  5  -  25^  +  P 

4£2(1 -£)(1- s)  8£^(1--?K-^-p)(J-^^-^~p11  if5>  1/2,  P<5'1/4 

- “  £  -r  p  -  2£* 


£  -  p 


We  define  A  2  by 


^2  =  |(£,p)  €  A;  £  <  1/2,  P>  1^1  2q/'j 

U  {(£,p)  €  A:  £  >  1/2,  P  >  1-8-'  ~  -^1) 


=  A2.I  U  A2.2 


Since  5(1  -  s)  <  1/4  ^  ^ 

Since  moreover  p  >  (l  - 


P2(^-.P)  < 


4  1  2  -  —  -  2£ 

2o 


p) 


- — -  on  all  of  A2.1. 

Ais-Ss'+p) 


-1 


<  a 


on  A2,]  • 
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On  ^2.7  n  {(6,p)  €  A;  V  t  -  I /A),  one  easily  checks  that  r/2(4,p)  = 
satisfies  dj,T}2  ^  0  everywhere.  It  fohows  that  r/2  achieves  its  maximum  on  the  boundary 
of  this  domain,  given  by  the  three  pieces  p  =  ,  with  If^  <  s  <  .9.  p  =  .<■  -  l/A  with 

f/2  <  and  p  =  2a4  -  o^  with  ■“  <  s  <  .9.  One  easily  checks  that  the  maximum 

value  of  f?j  on  this  boundary  is  .9. 

Similarly  one  checks  that  achieves  its  maximum  on  Aj,?  H  {(4,p)  €  A;  p  <  s  -  1/4} 
on  the  boundary  of  this  set;  again  this  leads  to  <  .9. 

It  follows  that 


(C.5) 


^2(4iP)  <  .9  =  o?  on  all  of  Aj  . 


It  remains  to  determine  an  upper  bound  on  t?3(4,p)  =  ri(s,p)r;{b(6,p))r}{b^{s,p))  on 
A  \jAi  U  Aj)  =  {(4,p):  24-1  <  p  <  p  >  s  -  ^4^(1  -  s).  p  <  l.Sf  -  .81}  Since 
-f)  is  strictly  increasing,  we  have  A\(Aj  UAj)  C  A3  =  {(4, p),  24-1  <  p  < 

Pj  =  1.&41  -  .61  <  p  <  1.84  -  .81},  where  4i  is  the  solution  to  4  -  ^4^(1  -  4)  =  I.84-.8I. 
In  A3  one  has  to  distinguish  4  subdomains,  corresponding  to  different  expressions  for  773, 
namely  A3.]  =  A3  n  {p  >  pi,  p  >  24  -  1,  p  <  2(4  -  1/4)^},  A3.2  =  A3  fi  {p  >  2(4  -  l/4)^ 
p  <  4  -  1/4},  A3.3  =  A3  n  {p  >  4  -  1/4,  p  >  2(4  -  1/4)^}  and  A3.4  =  A3n{p>4-l/4, 
p  <  4(4  -  1/4)^}.  On  A3.1,  A3.3  and  A3.4  one  checks  explicitly  that  6^773  d  0.  On  As.j,  the 
exact  expression  for  773  is  too  complicated,  but  one  can  replace  it  by  an  upper  bound, 

4  -  p  4  -  p  .  s-p 


s-p  s-p  's~'p 


164^(1  -  4)j(l  ->) 

4  ~  p 


%(S»P)  • 


This  upper  bound  again  satisfies  ^  0  on  As.j.  It  follows  that  773  on  A3  is  bounded 
by  the  maximum  of  773  on  the  boundaries  of  A3.1,  A3.3,  A3,4  and  of  773  on  the  boundary  of 
A3.2.  Explicitly,  for  all  (4,p)  €  A3, 


(C-6)  »73(5,p)  <  ^3(5i,Pi)  =  .88145650226.... 

This  numerical  upper  bound  is  larger  than  (.9)^^*:  it  follows  therefore  from  (C.4)  and  ('C.5) 
that  we  have  proved  (C.1)-(C.2)  for  C  =  l773(4i,pi)]^/3  ^  .958812370442.... 
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Scale  and  Inverse  Frequency 
Representations 

Leon  Cohen* 

CAIP  Center,  Rutgers  University,  Piscattwty,  New  Jersey  08855-1390 

It  is  shown  that  there  are  two  basic  ideas  regarding  scale.  One  is  inverse  frequency  and  the 
other  is  scaling  of  frequency  functions.  An  explicit  expression  for  the  scaling  operator  is  given 
and  its  general  properties  are  derived.  A  general  approach  for  obtaining  joint  scale  representations 
is  presented.  Joint  representations  of  scale  and  time  are  obtained  and  a  method  to  generate  an 
infinite  number  of  such  distributions  is  given.  The  results  of  Bertrand  and  Bertrand,  Marinovic 
and  Altes  are  obtained  as  special  cases  of  joint  distributions  involving  scale  and  inverse  frequency. 
The  expression  for  instantaneous  scale  is  derived.  The  uncertainty  principle  involving  scale  and 
time,  and  the  minimum  time-scale  uncertainty  signal  is  obtained. 
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1.  INTRODUCTION 
AND  GENERAL  APPROACH 

Our  aim  is  to  study  scale  and  describe  a  general 
method  for  obtaining  joint  representations  of  scale 
with  time  or  frequency.  The  fundamental  idea  is 
to  use  the  characteristic  function  method  which  we 
now  describe.  The  characteristic  function  of  the 
joint  distribution,  P{a,b),  is 

A/(^,C)  =  JJ  P{a,b)  da  db  .  (1.1) 

The  distribution  is  obtained  by 

P(a,6)  MiU)  d^  dC  ■  (1-2) 
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Therefore,  if  there  were  a  way  to  obtain  the  char¬ 
acteristic  function  we  could  find  the  distribution 
function.  The  characteristic  function  is  an  average, 
namely  the  average  of 

M(^,0=  (1.3) 

and  generally  can  not  be  obtained  without  know¬ 
ing  the  distribution.  However,  the  essence  of  the 
method  we  present  is  that  indeed  one  can  calculate 
the  characteristic  function  directly  from  the  signal. 
This  is  done  by  associating  operators  with  physical 
quantities,  a  method  first  proposed  by  Gabor  and 
Ville  for  time  and  frequency  and  generalized  by 
others. 

One  calculates  averages  by  “sandwiching”  the 
operator  between  the  signal  and  its  complex  conju¬ 
gate.  In  particular,  if  a  quantity  is  represented  by 
the  operator  C(T,  W)  where  T  and  W  are  the  time 
and  frequency  operators  then  its  average  value  is 
obtained  by  way  of 

(C)  =  J s'(i)  c(T,n;)  s(t)  dt  (1.4) 

where  s(i)  is  the  signal.  (Operators  will  be  denoted 
in  calligraphic  letters.  We  generally  assume  that 
they  are  Hermitian  and  that  their  eigenfunctions 
are  normalized  to  a  delta  function.) 

To  apply  this  to  our  case  suppose  we  have  two 
quantities  a  and  b  and  suppose  these  two  quantities 
are  represented  by  the  operators  A  and  B.  We 
define  the  characteristic  function  operator  by 
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=  (1-5) 

and  the  characteristic  function  is  then  the  average 
of 

(MiX))  ■  (1-6) 

Using  Eq.  (1.4)  we  take 

M(f,C)  =  J  ^*(0  sit)  dt  .  (1.7) 

In  Eq.  (1.7)  A  and  B  have  to  be  expressed  in  the 
time  representation.  However  any  other  represen¬ 
tation  can  be  used.  In  particular,  in  the  frequency 
representation, 

M(^,C)  =  J  S'iu)  5(w)  dLj  (1.8) 

where  5(w)  is  the  Fourier  transform  of  s(t) 

5(w)  =  /  sit)  dt  .  (1.9) 

v27r  J 

In  that  case  we  have  to  express  the  operators  in 
terms  of  frequen-  •  variables. 

Once  Af(^,C)  is  obtained  the  distribution  is  cal¬ 
culated  by  Eq.  (1.2).  That  is  the  basis  of  our 
method.  We  now  face  two  issues.  First  we  must 
find  the  scale  operator  and  secondly  we  must  have 
ways  of  evaluating  quantities  like  Eq.  (1.7).  W'e 
address  these  two  issues  in  the  next  two  Sections. 

We  also  point  out  that  because  A  and  B  are  op¬ 
erators  there  is  an  ordering  problem.  For  example, 
one  can  take  the  characteristic  function  operator  to 
be 

•M(^,C)  =  (1.10) 

Since  in  general  operators  do  not  commute  this  will 
give  a  different  answer  than  Eq.  (1.7).  For  each 
different  ordering  a  different  characteristic  function 
is  obtained  and  hence  a  different  distribution  This 
is  the  reason  why  we  have  an  infinite  number  of 
distributions  [6]. 


2.  SCALE  OPERATORS 

The  fundamental  operators  are  the  time  and  fre¬ 
quency  operators,  T  and  W,  which  in  the  time  and 
frequency  representation  are  given  by  [7] 

d 

T t  ;  (time  domain)  (2.1) 

T  — ►  j—  ;  yV  —*  u>  (frequency  domain)  . 
duj 

(2.2) 

They  satisfy  the  commutation  relation 

[r,w]  =  rw->vr  =  ;.  (2.3) 

The  frequency  operator  translates  a  time  func¬ 
tion,  fit)  according  to 

e>^'^/(0  =  /(t  +  r)  (2.4) 

where  r  is  any  number. 

A.  Time  scale  operator 

We  propose  the  following  operator  for  time  scale 
5=1  (TW  +  WT) .  (2.5) 

The  reason  for  calling  5  a  scale  operator  is  because 

of  the  following  action  on  time  functions,  /(<), 

=  e^’/'^fie^t)  ,  (2.6) 

where  a  is  any  number  parameter.  That  is, 
scales  time  functions  in  analogy  with  which 

translates  functions  in  time.  The  scaling  factor  is 
e".  Note  that  the  S  is  Hermitian  and  that  is 
unitary  as  is  the  case  with  the  frequency  operator. 

We  list  some  properties  of  the  time  scaling  oper¬ 
ator  which  we  will  subsequently  use  [8].  First  we 
note  that  it  can  be  written  in  the  following  alternate 
ways 

5  =  TW  -  1;  =  Wr  +  1;  (2.7) 
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The  commutation  relation  of  the  scale  operator  with 
the  time  and  frequency  operators  are 


[r,s]  =  jT 

(2.8) 

[W,5]  =  -jVW  . 

(2.9) 

In  addition  the  following  coimnutation  relations  will 
turn  out  to  be  important 

[r.[T,5]]  =  0, 

(2.10) 

[W,[W,5]]  =  0, 

(2.11) 

[5,[r,5]  =  r 

(2.12) 

[5,[W,5]  =  w. 

(2.13) 

The  time  scale  operator  has  the  following  effect 
on  functions  of  frequency,  F{u), 

(2.14) 

B.  Frequency  Scale  Operator 

Eq.  (2.14)  suggests  that  the  frequency  scale  op¬ 
erator  is  appropriately  defined  by 

Sr  =  —S 

(2.15) 

=  -i(rvw-t->vr) 

(2.16) 

=  i  j  -  Tw  =  -wr-  ij 

(2.17) 

Its  operation  on  time  and  frequency  functions  are 


(2.18) 

f{t)  =  c-"/V(e“‘'0- 

(2.19) 

The  time  and  frequency  scale  operators  are  simply 
related  to  each  other  and  hence  once  relations  are 


obtained  for  one  of  the  operators  it  will  be  easy  to 
transcribe  to  the  other. 

The  relevant  commutation  relations  for  the  fre¬ 
quency  operator  are 


[T,Sr]  =  -jT, 

(2.20) 

=  jf'VW  , 

(2.21) 

[r,[r,5H]  =  o, 

(2.22) 

[W,[>V,5f]]  =  0  , 

(2.23) 

[5F,[r,5F]  =  r, 

(2.24) 

[5f,[>V,5f]  =  W. 

(2.25) 

3.  METHOD  OF  EVALUATION 

There  are  two  general  methods  that  have  been 
devised  to  evaluate  expressions  like  Eq.  (1.7).  The 
first  is  by  direct  evaluation  as  per  the  following 
prescription  [16,8] 

=  J  s{i')  dt'  (3.1) 

where 

G{t,i')  =  J  u{X,i')  u(A,t)  dX.  (3.2) 

and  where  X  and  u  aire  the  eigenvalue  and  eigen¬ 
functions  of 

{  M  }  «('^,  0  =  uiX,  t)  (3.3) 

The  second  approach  is  simplification  of  the  op¬ 
erator  This  problem  arises  in  many  fields 

and  has  not  been  solved  generally.  However  sim¬ 
plification  is  possible  for  special  cases.  The  most 
common  special  case  is  when  A  and  B  commute 
with  their  commutator,  [>1,5].  That  is  the  case 
for  time  and  frequency.  However  that  will  not  be 
the  case  for  the  scale  operator.  However,  as  can  be 
seen  from  Eq.  (2.12-2.13)  the  scale  operator  and 
time  operator  satisfy  a  commutation  relation  of  the 
form 

{A,B]  =  0  +  aA  (3.4) 
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For  this  case  [8] 

=  gjuiPIo  gjinA  gjCB  giM  ^3  5^ 


M{e,a)=  I  s'{e‘''^t)  sie-’^^t)  dt  . 

(4.4) 


where 


To  obtain  the  distribution  we  use 


jQa 


4.  JOINT  DISTRIBUTIONS  WITH  SCALE 

The  concept  of  a  time-scale  representation  was 
first  considered  by  Bertrand  and  Bertrand  [1-3]  in 
their  pioneering  work.  They  derived  a  particular 
distribution  using  group  theory  and  generalized  to 
the  wide  band  ambiguity  function.  Subsequently 
Marinovic  [13]  and  Altes  [1]  considered  the  issue 
from  different  perspectives  and  also  derived  a  dis¬ 
tribution.  It  is  different  from  the  one  obtained  by 
Bertrand  and  Bertrand.  We  shall  show  that  these 
two  distributions  are  fundamentally  different.  One 
deals  with  scaling  as  defined  by  the  scaling  operator 
and  the  other  deals  with  inverse  frequency.  Inverse 
frequency  will  be  discussed  in  Section  (9). 

We  now  derive  a  distribution  of  time  and  scale 
using  Eq.  (1.7)  for  the  characteristic  function  oper¬ 
ator.  We  shall  use  t  and  a  to  denote  time  and  scale 
respectively  and  use  9  and  c  for  the  corresponding 
variables  in  the  characteristic  function.  The  char¬ 
acteristic  function  is 

M{9,a)  =  J  s’{t)  dt  .  (4.1) 


P{t,a)  JJ  M{e,a)  dJB  do  .  (4.5) 


Substituting  Eq.  (4.4)  for  the  characteristic  func¬ 
tion,  we  obtain,  after  some  simplification,  that 


(,  a)  =  -^  /  — 

'  ’  2t:  J  2si 


nh(cr/2) 


sVc"/* - — - )  s(e-‘’/2 - — - )  do. 

2s\nh{ol2)  2s\nh{o/2) 


This  is  a  joint  representation  for  time  and  scale. 

A.  Marginals 

The  marginals  of  a  joint  distribution  are  the  den¬ 
sities  of  the  individual  variables.  To  obtain  the  time 
marginal  we  integrate  out  scale 


I  P(t,a)da  =  K0p 


This  result  is  expected  since  |s(t)|*  is  the  time  den¬ 
sity.  To  obtain  the  density  of  scale  we  integrate  out 
time.  The  result  is 


Using  Eq.  (3.5)  this  becomes 


?,<r)  =  J  s-{t)  dt  (4.2) 


I  P{t,a)  di=\-^  J  s(e^)  dx 


This  density  for  scale  will  be  discussed  further  in 
Section  (7). 


p  =  -  {1  -  (1  -f  a)  e  .  (4.3) 

o 


Straightforward  algebra  reduces  this  to 


B.  Other  Distributions 

Suppose  we  use  the  following  characteristic  func¬ 
tion  operator. 
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M{e,a)  =  (4  9) 


to  obtain  a  distribution.  The  characteristic  func¬ 
tion  is  then 


Mie,a)  =  J  s’{t)  s(0  di 

(4.10) 


and  direct  evaluation  leads  to 

Mie,<T)  =  j  s'{t'’'H)e^^^s{e-^lh)dt.  (4.11) 

The  distribution  is 


P{t,a)=-^  Jj  M {6,  a)  dO  da  (4.12) 


which  gives 


P{t,a)  =  7^  /  s-(e^'h)  da. 

27r  J 

(4.13) 

This  distribution  was  derived  by  Marinovic  [13]  and 
Altes  [1].  It  also  satisfies  the  marginals,  Eqs.  (4.7- 
4.8). 

How  is  it  that  we  may  obtain  two  different  distri¬ 
butions?  In  fact,  we  will  obtain  an  infinite  number 
of  distributions.  The  reason  is  that  different  or¬ 
derings  of  the  operators  produce  different  distribu¬ 
tions.  However  all  these  distributions  are  connected 
as  we  discuss  in  the  next  Section. 


5.  GENERAL  CLASS  WITH  SCALE 

As  in  the  time  frequency  case  we  can  obtain  a 
general  class  of  time  scale  distributions.  One  de¬ 
fines  a  generalized  characteristic  function  by 


Mci^.a)  =  (i>{6,a)  M{6,a)  (5.1) 


where  M{6,a)  is  any  particular  characteristic  func¬ 
tion  and  4>{B,a)  is  the  kernel.  Choosing  different 


kernels  produces  different  distributions.  The  gen¬ 
eral  class  is  [6,7,5] 

P{Ua)  JJ  MGie,<T)  de  da  (5.2) 

=  Jj  <t>i6,<T)Mie,a)  dB  da  .  (5.3) 

Which  particular  characteristic  function  is  chosen 
as  the  base  is  not  important,  but  rather  motivated 
by  convenience.  Suppose  we  chose  the  one  given  by 
Eq.  (4.4). 


^(<,a)  =  ^  jjj  i'(  e‘'^\)4>{B,a) 

(5.4) 

If  we  choose  the  one  given  by  Eq.  (4.11)  then 

Pit,  a)  =  -^  JJj 

<p{B,a)s{e~’’^^u)  dB  dt  da  (5.5) 


Eq.  (5.4)  or  (5.5)  may  be  considered  a  general  class 
and  any  other  distribution  is  obtained  by  appropri¬ 
ate  choice  of  the  kernel. 


6.  INSTANTANEOUS  SCALE 

Since  Eq.  (1.4)  allows  us  to  calculate  the  aver¬ 
age  of  any  function  "'e  can  use  it  to  calculate  the 
average  scale, 

{Sf)  =  J  s‘ it)  Sf  sit)  dt  (6.1) 

If  we  express  the  signal  in  terms  of  the  amplitude 
and  phase 


sit)  =  Ait)  ,  (6.2) 


o 
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thenf 

SfsH)  =  [{tA'IA  +  i)  j  -  ^<^'(0]  5(0  (6.3) 

Substituting  in  Eq.  (6.1)  one  straightforwardly  ob¬ 
tains 

{Sf)  =  -  J  t  ,p\t)  A\t)  dt  (6.4) 

For  the  scale  bandwidth,  B,  we  have 

=  J  s-{t)  (Sf  -  (Sf)?  s(t)  dt  (6.5) 

=  JuSf-  {SF))s{t)\^  dt  (6.6) 

=  {S].)-{SFy  (6.7) 

Direct  calculation  leads  to 

+  Jit<p'{t)-\-  {Sf)?  A\t)  dt  (6.8) 

This  equation  can  be  interpreted  in  the  following 

way.  Consider  any  two  variables,  i  and  y,  which 
have  a  joint  density.  The  standard  deviation  of  y 
can  be  expressed  in  terms  of  the  conditional  stan¬ 
dard  deviation,  and  the  conditional  average 
(  y  )j-.  The  relation  is  [9,10] 

cl  =  J  aJi,  P(x)  dx+l  {{y)x-{y)?  P{x)dx 

(6.9) 

where  P{x)  is  the  density  of  x.  The  conditional 
value  is  what  is  commonly  called  an  instantaneous 

t  Primes  denote  differentiation  with  respect  to  the 
argument. 


or  local  value.  Comparing  E^.  (6.8)  with  Eq.  (6.9) 
suggests  that  we  take  for  instantaneous  scale 

a,  =  -t  <p'{t)  (6.10) 

The  average  scale  is  given  by  averaging  the  instan¬ 
taneous  scale  over  all  time  as  per  Eq.  (6.4). 

Also,  the  conditional  spread  of  scale  for  a  given 
time  may  be  taken  to  be 

<.=(4+0 

Similar  to  the  above  considerations  one  can  show 
that  scale  for  a  given  frequency  is  given  by 

=  u.’ t/’'(u;) .  (6.12) 

where  tp  is  the  phase  of  the  spectrum. 

7.  EIGENFUNCTIONS  AND  EIGENVALUES 
OF  SCALE 

The  significance  of  solving  the  eigenvalue  prob¬ 
lem  for  an  operator  is  that  it  gives  a  means  of 
obtaining  the  density  for  that  quantity.  Also  the 
eigenvalues  give  the  range  of  the  possible  values. 
For  scale,  the  eigenvalue  problem  is 

Sf  ■n{a,t)  =  a  r,{a,t)  (7.1) 

Since  the  scale  operator  is  Hermitian  the  eigenval¬ 
ues  will  be  real  and  eigenfunctions  complete.  There¬ 
fore  any  function  s{t)  can  be  expanded  in  terms  of 
the  eigenfunctions 

s{t)  =  J  F{a)  i]{a,t)da  (7.2) 

with  the  inverse  transformation 

F(a)  =  j  s{t)  T]'{a,t)  dt  (7.3) 
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The  density  of  scale  is  then 

8.  UNCERTAINTY  PRINCIPLE  FOR  SCALE 

P{a)  =  \F{a)\^ 

(7.4) 

For  arbitrary  operators  the  uncertainty  principle 
is 

In  the  time  representation  the  scale  eigenvalue 
problem  becomes 

A.4  AB>\\{[A,B])\, 

(8.1) 

where  (AA)^  is  the  standard  deviation  of  A 

given 

j(  ^  +  )  T?(a,t)  =  aJ/(o,t). 

(7.5) 

by 

and  the  solution  is 

(A.4)*  =  J  s’it)  (A  -  (.4))^  sit)  dt 

(8.2) 

1  g-jolog( 

v/I  ■ 

(7.6) 

=  J\iA-  {A))sit)\^  dt 

1 

(8.3)  j 

These  satisfy  the  completeness  relations 

II 

1 

! 

(8.4) 

f  r)’{a\t)  T]{a,t)  dt  =  6{a- a') 

Jo 

(7.7) 

Similarly  for  B. 

For  the  case  of  time  and  scale  we  have 

1  Ti(a,t)  da  = 

(7.8) 

AT  A5f>  i|([r,5H)| 

(8.5) 

FVom  Eq.  (7.3)  we  have  that  for  any  signal 

Using  Eq.  (2.8)  this  gives 

F(a)  =  J  sit)  v'ia,t)dt 

(7.9) 

AT  A5f>  il(t)| 

(8.6) 

1  f  e>oiog< 

^rt 

The  density  of  scale  is  then 

(7.10) 

where  {t)  is  the  mean  time. 

One  can  obtain  the  signal  which  minimizes  the 
uncertainty  product.  In  general  one  solves  the  fol¬ 
lowing  equation 

Pia)  =  |F(a)|^ 

(7.11) 

iB-  {B))sit)  =  XiA-  {A))sit) 

(8.7) 

1  1  t  e>a>o«<  2 

(7.12) 

where 

.  ([AS]) 

2iAA)^  ■ 

(8.8) 

‘7rJ‘^  ’ 

(7.13) 

Specializing  to  time  and  scale 

which  is  the  same  as  Eq.  (4.8)  which  wa.s  obtained 
as  one  of  the  marginals  of  the  joint  distribution  of 
time  and  scale. 

7 

)  s{t)  =  X(t-  (l>) 

sit) 

(8.9) 
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where 


.  {[T,Sf])  _  ■  (t) 

2{Aty  ^2(At)2 


(8.10) 


and  where  (t)  and  At  are  the  mean  time  and 
duration  respectively.  The  quantity  (Sf)  is  the 
mean  scale  given  by  Eq.  (6.4).  The  solution  of  Eq. 
(8.9)  is 


where  wo  is  some  reference  frequency.  We  shall 
use  r  to  signify  inverse  frequency  values  to  keep 
a  distinction  between  inverse  frequency  and  scale, 
where  we  have  used  a.  We  now  give  some  properties 
of  the  operator  TZ. 

The  commutator  of  the  inverse  frequency  opera¬ 
tor  with  time  is 

[T,7^]  =  -^7^2  (9.2) 

Wo 


s(t)  =  C  (g  JJ) 


where  c  is  a  normalizing  constant  and 


2(At)2 


(8.12) 


= K(W  - 


and  the  uncertainty  principle  for  time  and  inverse 
frequency  is  therefore 


AT  A7£>  l|([T,wo>V-’])l  =  ^(4r)  (9.3) 

^  2,  u- 


Also  the  minimum  uncertainty  signal  is  easily  de¬ 
rived.  We  do  it  in  the  frequency  representation. 
Using  Eq.  (8.7)  we  have  that  the  minimum  time- 
inverse  frequency  signal,  F(w),  must  satisfy 


We  now  address  what  minimizes  frequency  and 
frequency  scale  uncertainty.  The  equation  we  have 
to  solve  is 


(8.14) 


where 


(|R,r|)  ^  {-R?) 

2(hnf  ^  2{  AK)= 


The  solution  is 


A  =  ^  .  (^) 

2(Aw)2  ^2(Aw)2 


(8.15) 


F(w)  = 


The  solution  is 


s{t)  =  (t  +  .  (8.16) 


{2Any 


;  Q2  =  -  Qj  .  (9.7) 


9.  INVERSE  FREQUENCY 


The  eigenfunctions  of  TZ  in  the  frequency  repre¬ 
sentation  are  obtained  from 


We  define  the  inverse  frequency  operator  by 


—  u(r,u})  =  r  u(r,w) 
w 


which  gives 


8 
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u{r,u)  =  ~  ‘*'o/r)  .  (9-9) 


In  the  time  representation  they  are 


I 


=  /  5‘(a>-|-^g)5(u;-^g)exp[j’^ln^^ 

'  (10.4) 


The  distribution  is 


P{t,r)  jjj  S'{u:  +  i^)5(u;  -  \e) 


We  note  that  in  the  time  representation  the  in¬ 
verse  frequency  operator  is  an  integral  operator, 


n 


(9.11) 


exp[;^^ln — — yr]  e  •’*'  ^‘'°d0da  <Lo  (10.5) 

O  U>  —  ^o 


which  after  considerable  simplification  gives 


10.  DISTRIBUTION  OF  TIME 
AND  INVERSE  FREQUENCY 


PiUr) 


1  f  f  Y 

STrr^  y  \sinhu/2/ 


^-ju/out/r 


We  now  obtain  joint  distributions  of  time  and 
inverse  frequency  by  two  different  methods.  The 
first  method  is  a  direct  application  of  the  general 
approach  presented  in  the  Introduction.  However, 
since  inverse  frequency  is  a  function  of  frequency, 
rather  than  an  operator,  there  is  a  simpler  method 
for  this  special  case.  This  is  presented  as  method 
two. 


S‘i 


u>ou  e 


u/2 


u)ou  e 


-v/2 


2r  sinh  u/2  2r  sinh  u/2 


)du.  (10.6) 


This  distribution  was  obtained  by  Bertrand  and 
Bertrand  by  different  methods  (2,3,4). 


B.  Method  2 


A.  Method  1 

We  use  the  method  presented  in  the  Introduc¬ 
tion  with  the  two  operators  being  T  and  TZ.  The 
characteristic  function  is 

M(r,a)  =  J  S’{u>)  e^*^+-'‘''^5(a;)  (Lj  .  (10.1) 

One  can  show  that 

€■’'”'■^•’"’^5(0;)  =  j  S{u;-e)exp\j‘^\n 

(10.2) 

Therefore 

M{t,(7)  =  j  5’(a;)5(aj-0)exp[j^ln 

(10.3) 


Since  inverse  frequency  is  functionally  related  to 
frequency  we  can  use  the  standard  methods  for 
transformation  of  variables 


P{t,  r)  dt  dr  =  P(t,u;)  dt  cLj  . 

(10.7) 

where  P{t,u;)  is  a  time  frequency  distribution.  In 
particular  we  take 

r  =  u>o/u!  , 

(10.8) 

dr  =  (Lj 

(10.9) 

Wo 

and  therefore 

(10.10) 

P(t,r)  =  ^  F(t,wo/r). 
r* 

(10.11) 
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What  distribution  should  we  use  for  the  time  fre¬ 
quency  distribution?  We  can  use  any  one,  since 
they  are  all  related  by  way  of  the  kernel.  If  we  use 
the  Wigner  distribution 

P{t,u^)  =  ^  /  s’{t  -  ir)  s{t  +  ir)dr 

^  (10.12) 

we  obtain 


A.  Marginals 

The  marginals  are  easily  obtained 

j  P{t,r)  dr  =  |s(t)p 

(10.19) 

J  P[t,T)  di  =  ^\Siuo/r)f 

(10.20) 

P(t,r)  =  W(t,u;o/r)  (10.13) 

r* 

■■  s{t+\T)dT  (10.14) 

I  s‘{uoiT-ei2)  siuo/r-e/2)de. 

(10.15) 


As  before  the  time  marginal  is  as  expected,  |s(t )p . 
The  inverse  frequency  marginal  can  be  understood 
from  the  following  considerations.  The  distribution 
of  frequency  is 

P(u;)  =  |5(a;)^  (10.21) 

hence  the  distribution  of  r  is 


Any  other  time  frequency  distribution  can  be 
used.  In  fact  the  general  class  of  distributions  can 
be  used  to  obtain  the  general  class  of  time  and 
inverse  frequency  distributions.  The  general  class 
of  time  frequency  distributions  is  [5,7] 


P(r)  =  Piu) 


duj 

dr 


u/swo/r 


(10.22) 


=  \S{u^olrf  .  (10.23) 


s“{u  —  ^t)s{u+ \T)dudT  d6  .  (10.16) 

Using  Eq.  (10.11)  we  have  a  general  class  for  time 
and  inverse  frequency 


which  agrees  with  Eq.  (10.20). 

11.  THE  WAVELET  TRANSFORM 

The  modulus  squared  of  wavelet  transform  can 
be  viewed  as  a  joint  representation  of  time  and 
‘“scale”.  The  wavelet  transform  is 


P(t,r) 


1  u;o 
47r^ 


WT{t,r)  = 


v/Fi 


(11.1) 


s‘{u  —  ^t)  s{u  +  \r)  dudr  d6  .  (10.17) 

In  terms  of  the  spectrum  this  is 


where  we  have  used  t  instead  of  the  traditional  a. 
The  reason  we  have  done  so  is  that  it  will  turn  out 
that  in  the  wavelet  transform  it  is  inverse  frequency 
that  comes  in.  The  density  of  time  and  r  is 


5*(u  -I-  y^)S(u  -  dOdrdu  .  (10.18) 

10 


PM..T(f,^)  =  |V(/r(t,r)p  .  (11.2) 

We  now  address  the  relation  of  this  distribution  to 
the  type  of  distributions  discussed  in  the  previous 
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sections.  The  first  to  consider  a  possible  relation¬ 
ship  with  time  frequency  distributions  were  Jeong 
and  Williams  [11,12]  and  Rioul  [14]  using  different 
approaches.  Jeong  and  Williams  [11,12]  showed  in 
a  simple  and  direct  way  that  the  distribution  given 
by  Eq.  (11.2)  can  be  obtained  from  the  general 
class  of  time  frequency  distributions  by  taking  an 
appropriate  kernel  and  evaluating  the  distribution 
at  inverse  frequency.  Rioul  [14]  showed  that  it 
can  be  expressed  as  a  convolution  of  two  Wigner 
distributions.  Posch  [17]  then  generalized  this  to 
show  PwT  can  be  expressed  as  a  convolution  of 
any  two  time  frequency  distributions.  Rioul  and 
Flandrin  [15]  used  these  relationships  to  define  a 
general  class  of  time  and  scale.  We  emphasize  that 
“scale”  as  used  in  these  works  simply  means  inverse 
frequency.  The  distribution  given  by  Eq.  (11.2)  is 
a  member  of  the  general  class  given  by  Eq.  (10.17). 
To  explicitly  obtain  the  kernel  one  expands  Eq. 
(11.2)  and  puts  it  in  the  form  given  by  Eq.  (10.17). 
The  kernel  is  then 

2 

d>(fi,r)  =  27r  —  e^"“'''/>s(r«,r/r)  (11.3) 

u>o 

where  <^s  is  the  kernel  that  produces  a  spectrogram 
with  window 

4>s{6,t)  =  J  e-^«“V’*(u+|r)V’(ti-ir)d«  (11.4) 

This  is  essentially  the  result  obtained  by  Jeong  and 
Williams.  We  note  that  it  is  not  only  a  function 
of  6  and  r  but  also  depends  on  r.  Therefore  we 
see  that  the  distribution  obtained  from  the  wavelet 
transform  is  really  a  distribution  of  time  and  inverse 
frequency. 

12.  CONCLUSION 

We  have  shown  that  scale  as  defined  by  the  op¬ 
erator  S  and  by  the  inverse  frequency  operator  Tv. 
both  lead  to  a  form  of  scaling  and  that  these  two 
different  ideas  have  both  been  used  in  the  literature 
for  the  concept  of  scale.  It  is  important  to  keep 
them  separate  because  although  they  do  have  many 
similarities  they  are  quite  different.  They  have  very 
different  densities,  the  density  of  scale  being  given 


by  Eiq.  (4.8)  and  the  density  of  inverse  frequency 
by  Eq.  (10.20). 

Also,  a  way  to  see  that  they  are  very  different  is 
to  consider  their  commutator.  The  commutator  of 
Sf  and  Tv  is 

[Sf,TZ]  = -jn  (12.1) 

That  is,  scale  and  inverse  frequency  do  not  com¬ 
mute.  That  is  an  indication  that  they  are  quite 
different  in  their  physical  values.  Which  of  the  two 
mathematical  constructs  is  better  suited  to  describe 
our  intuitive  notion  of  “scale”  remains  to  be  seen. 
Perhaps  both.  Perhaps  neither. 
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Abstract 

By  constructioa,  time-scale  methods  are  well-suited  for  analyzing  processes  (such 
as  1//  processes)  which  are  characterised  simultaneously  by  nonstationahties 
in  lime  and  self-similarity  properties  in  scale.  In  this  respect,  wavelets  offer  a 
natural  possibility  of  time-scale  analysis  which  is  particularly  powerful  in  the 
case  of  fractional  Brownian  motion  (fBm):  for  instance,  it  is  shown  that  mild 
conditions  can  ensure  orthonormal  wavelets  to  provide  almost  Karhunen-Loeve 
expansions  of  {Bm  from  which  the  scaling  exponent  can  be  estimated.  Interesting 
scaling  properties  hold  for  continuous  wavelet  transforms  too,  especially  when 
generalizing  fBm  to  the  case  of  locally  self-similar  processes.  However,  it  is  argued 
that,  in  the  continuous  case,  a  more  general  class  of  time-scale  distributions  (from 
which  the  wavelet  transform  is  only  a  special  case)  can  be  used  with  possibly 
increased  performance,  e.g.  for  estimating  local  scaling  exponents. 


1  Introduction 

Id  a  large  number  of  physical  phenomena  (e.g.  in  turbulence),  self-similar 
processes  with  a  l//-type  spectral  behavior  over  wide  ranges  of  frequencies 
are  observed.  Although  of  great  importance,  the  study  of  processes  of  this 
kind  is  faced  with  a  number  of  difficulties  (such  as  the  slow  decay  of  the 
correlation  structure)  which  call  for  specific  models  and  adapted  analysis 
tools. 

One  such  modeling  has  been  proposed  in  [21]  and  is  referred  to  as 
fractional  Browntan  motion  (fBm).  Among  other  properties,  it  possesses 
that  of  being  statistically  self-similar,  which  means  that  any  portion  of 
a  given  fBm  can  be  viewed  (from  a  statistical  point  of  view)  as  a  scaled 
version  of  a  larger  part  of  the  same  process.  This  is  of  course  in  the  spirit 
of  wavelets  [5],  which  can  all  be  deduced  from  one  elementary  waveform  by 
means  of  shifts  and  dilations. 
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On  the  other  hand,  recent  works  [13]  (26]  have  emphasired  the  fact  that 
wavelets  are  only  a  special  case  within  a  more  general  class  of  time-scale 
distributions,  some  of  them  with  possibly  better  properties. 

It  is  therefore  the  purpose  of  this  paper  to  summarize  a  number  of 
results  concerning  the  usefulness  of  wavelets  for  analyzing  (possibly  modi¬ 
fied)  fBm  |11]  [12],  as  well  as  to  present  the  more  general  framework  into 
wavelets  fit,  suggesting  hence  companion  ways  of  time-scale  analysis  for 
self-similar  and  l//-type  processes. 


2  Self-similar  and  1//  processes 

2.1  Fractional  Brownian  motion 

Fractional  Brow-nian  motion  (fBm)  is  a  natural  extension  of  ordinary  Brow¬ 
nian  motion  [21].  It  is  a  Gaussian  zero-mean  nonstationary  stochastic  pro¬ 
cess  Bfiit),  indexed  by  a  single  scalar  parameter  0  <  B  <  1,  the  usual 
Brownian  motion  being  recovered  from  the  specification  H  =  1/2. 

It  is  a  nonstationary  process  since  [30] 


=  y(jfp"  -f  |sp"  -  (i  -  sp") 

a.  (2.1) 

var(B/,(t))  = 

where  £  stands  for  the  expectation  operator.  As  a  nonstationary  process, 
fBm  only  admits  an  average  spectrum  |30][8] 

(2-2) 

which  makes  it  well-suited  for  modeling  l//*type  processes. 

Increments  of  fBm  are  stationary  and  self-similar  in  the  sense  that  the 
probability  properties  of  the  process  Bfl(t-fs)-  Bh{1)  only  depend  on  the 
lag  variable  s  with,  moreover 

{BhU  +  as)  -  Bnit))  4  ja|"  (B^(t  +  s)  -  Buit)) ,  (2.3) 

where  =  means  equality  in  distribution. 

One  of  the  key  problems  related  to  the  analysis  of  fBm  is  the  estimation 
of  the  self-similarity  parameter  H ,  which  can  be  interpreted  as  a  scaling 
exponent  governing  the  fluctuations  of  the  increment  process. 
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2.2  Locally  self-siniilar  processes 

It  is  clear  that,  in  many  situations,  fBm  can  appear  as  a  too  restrictive 
model  and  more  or  less  ad  hoc  modifications  are  required.  One  such  mod¬ 
ification  is  to  drop  the  assumption  of  global  self-similarity  (characterized 
by  only  one  value  of  H  governing  identical  scaling  properties  at  all  scales) 
and  to  replace  it  by  a  milder  requirement  of  local  self-similarity.  In  such 
a  case,  only  small  scale  behavior  is  concerned  and  the  scaling  exponent  H 
is  allowed  to  be  time-dependent.  This  corresponds  to  processes  x{t)  such 
that  the  local  fluctuations  of  their  increments  satisfy 

£((x(l-^r)-z(f)n~!rr<')  ,  r  ^  0.  (2.4) 

The  1//  spectral  behavior  of  such  processes  is  mainly  dependent  on 
the  average  value  of  H{t)  over  time  but  the  "spectrum”  of  all  the  possible 
values  of  H(t)  provides  an  additional  information  on  a  possible  mullijrac- 
iai  mechanism  underlying  the  observed  process,  which  is  of  considerable 
importance  [14]  [22]. 

3  Wavelets  and  self-similar  processes 

Two  important  features  are  to  be  taken  into  account  when  analyzing  fBm 
or  locally  self-similar  processes:  nonsiaiionariiy  and  self-similarity.  This 
suggests  to  look  for  some  analysis  which  would  be  time- dependent  and  scale- 
dependent,  respectively.  As  a  result,  wavelet  analysis  [5]  which,  by  nature, 
is  a  time-scale  method,  appears  as  a  natural  possibility.  First  attempts  for 
analyzing  or  synthesizing  fBm  via  wavelets  can  be  found  in  [8]  [19]  or  [29]; 
most  of  the  results  given  below  provide  a  brief  account  of  a  more  complete 
study  reported  in  [12]. 


3.1  Orthonormal  wavelets 

Let  us  first  consider  some  discrete  orthonormal  wavelet  decomposition  of  a 
given  fBm  £^(1).  By  definition,  wavelet  coefficients  are  given  by  [19][23] 

/+0O 

BHit)iP{2-h-n)di,  ;€Z,neZ,  (3.1) 

‘OO 

where  \f>(<)  is  the  basic  "mother”  wavelet,  required  to  satisfy  the  admissi¬ 
bility  condition  [17] 

/  +  00 

if{1.)di  =  0. 

■OO 

The  simplest  family  of  orthonormal  wavelets  is  the  Haar  system 


(3.2) 
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+1  0</<l/2 

-1  1/2<<<1. 
0  otherwise 


(3.3) 


For  any  given  resolution  2'^,  the  wavelet  mean-square  representation  of 
fBm  is 


+O0  J  +00 

ns  — oo  )s— oo  ns— oo 

(3.4) 

where  equality  is  to  be  understood  in  a  mean-square  sense  and  where  the 
upproxtmaUon  coefficients 


a,{n]  =  2-^'^  r  BH{t)4>{1-n-n)dt  (3.5) 

J  ^oo 

are  computed  with  the  help  of  the  "scaling  function”  (or  "father  wavelet”) 
4>{i)  associated  with  rl>{t)  [19]  [23].  In  this  picture,  the  wavelet  coefficients 
are  interpreted  as  details,  i.e.  as  a  measure  of  difference  in  information 
between  two  successive  approximations. 

For  each  scale  2^,  the  wavelet  coefficients  dj[n]  form  a  discrete  sequence 
of  random  coefficients  but,  although  the  family  {2~^^^r/>(2~^i  —  n),j  £ 
Z,n  €  Z}  is  an  ortbonormal  system,  there  is  o  priori  no  reason  for  them 
to  be  uncorrelated.  The  explicit  correlation  structure  of  the  wavelet  coef¬ 
ficients  can  be  derived  [12],  one  of  its  consequence  being  that,  when  nor- 
malixed  according  to  d;[n]  =  (2^ d;[n],  wavelet  coefficients  of  fBm 
give  rise  to 

•  time  sequences  which  are  self-similar  and  stationary  in  the  sense  that, 
for  any  j,  5(dj  [n]dj[m])  is  a  unique  function  of  n  —  m,  namely 

r(d;(n]d^  [m])  =  ^  J  y,p{T  -  (n  -  m))|r|*"drj  ,  (3.6) 

with  y,^{T)  =  -  T)dr-, 

•  scale  sequences  which  are  stationary  in  the  sense  that,  for  any  n  and 
m  =  2^~*n  associated  with  synchronous  time  instants,  S{dj[Tx\dk  [”’]) 
is  a  unique  function  of  j  —  k,  namely 

r(d,[n]d*[2^-‘n])  =  y  (-  A„[2’-\r)\r\^^dr'^  (2^-‘)-("+i/2), 

(3.7) 


with  A,f,{Q,r)  =  v/q  —  T)dt. 
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(Doth  results  have  first  been  established  in  the  case  of  continuous  wavelet 
transforms,  the  former  in  [8]  and  the  latter  in  [28].) 

One  key  feature  revealed  by  this  structure  is  the  stationarity  of  detail 
sequences  at  any  resolution,  nonstationarity  of  fBm  being  in  fact  encoded  in 
the  approximation  sequences  for  which  we  get  the  time-dependent  variance 

vai(o^[n])  =  y  (-£‘"(-j^ir)-24>(T~n))\rf'^dr^  (2^)^"+^  (3.8) 

Variance  of  wavelet  coefficients  follows  the  power-law 

var(d,(n])  =  ylV(if)(2^ (3.9) 

where  is  defined  by 

/+00 

7^(r)|r|2"dr.  (3.10) 

'OO 

Therefore,  the  fBm  index  H  can  be  easily  obtained  from  the  slope  of 
this  variance  plotted  as  a  function  of  scale  in  a  log-log  plot  [19]  [15] 

log2(var(d,  [n]))  =  (2H  -f  l)j  -f  constant.  (3.11) 

An  example  is  given  in  Figure  1. 

In  general,  wavelet  coefficients  of  fBm  are  correlated  in  both  time  and 
scale  and  more  can  be  said  about  this  correlation  [27]  [12].  The  ideal 
situation  would  correspond  to  special  cases  for  which  perfect  decorrelation 
could  be  achieved.  In  such  a  case,  the  wavelet  decomposition  would  provide 
us  with  a  Kahrunen-Loeve-type  expansion  [29]  and  it  would  then  play  the 
role  of  a  whitening  filter  especially  adapted  to  self-similar  processes  (e.g.  for 
the  estimation  o{ H  according  to  eq.  (15)  via  empirical  variance  estimates). 

Although  this  goal  cannot  be  strictly  achieved,  it  turns  out  that  the 
simplest  orthonormal  wavelet  system,  i.e.  the  Haar  system,  approaches 
such  a  doubly  orthogonal  decomposition  when  H  =  1/2,  i.e.  for  ordinary 
Brownian  motion.  More  precisely  [12],  if  we  let  dy[n],j  €  Z,n  £  Z  be  the 
Haar  coefficients  associated  with  ordinary  Brownian  motion  (i.e.  fBm  with 
H  =  1/2),  the  correlation  of  dj[n]  with  Haar  coefficients  at  finer  scales 
k  <  j  \s  such  that 

^(^^[njditM)  =  0  (3.12) 

outside  of  the  (cone-shaped)  time-scale  domain  defined  by  indexes  (m,ifc) 
such  that  2-'~*n  <  m  <  2-’~*(n  -f  1)  —  1.  Moreover,  we  get  for  each  scale 

€{dj[n]dj[m])  =  var(d;[n])(5„,„  (3.13) 

and  the  correlation  in  scale  varies  as  for  synchronous  time  in¬ 

stants. 


142  AFl  1  /AFOSR  Wavelets  Workshop 


6 


tune  xlO<  log2(scale) 


Figure  1.  Wavelet  analysis  of  fractional  Brownian  motion 
The  analyzed  signal  is  a  simulated  £Bm  with  H=l/3  (left)  and  empirical  variance 
estimates  of  the  corresponding  wavelet  coefficients  (Daubechies  6)  are  plotted  as 
a  function  of  scale  in  a  log-log  plot  (right).  According  to  eq.  (15),  this  is  supposed 
to  be  a  straight  line  whose  slope  leads  to  the  estimate  B  =  0.34. 

The  Haar  system  does  not  provide,  siricto  sensu,  a  doubly  orthogonal 
decomposition  of  ordinary  Brownian  motion.  However,  the  correlation  be¬ 
tween  distinct  Haar  coefficients  decays  as  a  power-law  of  scale  and  is  zero 
for  a  given  scale.  Some  extensions  of  this  behavior  can  be  considered  by 
retaining  the  Haar  system  as  the  wavelet  basis  while  replacing  the  partic¬ 
ular  value  H  =  1/2  by  different  values  in  the  range  0  <  <  1  [12].  It 

reveals  in  fact  three  different  regimes  associated  to  the  three  possible  sit¬ 
uations  0  <  H  <  1/2,  H  =  1/2  and  1/2  <  H  <  \.  The  first  two  cases 
correspond  respectively  to  an  approximate  decorrelation  (fast  decay)  and 
a  perfect  decorrelation  (only  one  non-zero  coefficient)  whereas  the  last  one 
is  associated  to  a  long-term  correlation  (slow  decay) 

(n]dj  H)  -  O  (jn  -  (3.14) 

when  |n  -  m|  ^  1. 
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FVom  a  frequency  point  of  view,  the  decay  properties  of  the  coefficients 
correlation  is  governed  by 


e{djln]dj[m])  = 

y  (|26in(7r/f)r(2/f  +  1)  +  (3.15) 

where  ^(w)  is  the  Fourier  transform  of  rh{t).  This  heavily  depends  on 
the  behavior  of  'l'(w)  at  the  zero  frequency,  and  hence  on  the  number  of 
vanishing  moments  of  V>(t). 

More  prerisely,  if  has  at  most  R  vanishing  moments,  i.e.  if 

/+00 

Fif>{t)dt  =  0,  r<R,  (3.16) 

•OO 

then  it  is  not  possible  to  prevent  divergence  of  |4'(w)p|ui|“(^^'''*l  at  the  zero 
frequency  if  the  fBm  index  H  is  such  that  H  >  R  — 1/2.  As  &  consequence, 
the  correlation  of  the  corresponding  wavelet  coefficients  has  a  slow  decay 
and  the  asymptotic  behavior  [12] 

r(dj[nK[m])  ~  0  (|n  -  (3.17) 

when  jn  —  m|  >  1:  this  is  exactly  the  Haar  case  (R  =  1)  for  which  the 
divergence  situation  corresponds  to  the  fBm  range  1/2  <  FT  <  1. 

This  drawback  is  easily  overcome  as  soon  as  a  wavelet  with  R  >  2  is 
chosen,  the  quantity  R  —  1/2  being  then  ensured  to  exceed  the  maximum 
value  of  H  within  its  range,  i.e.  1.  As  expected,  the  pathological  situation 
of  a  slow  decay  of  the  wavelet  coefficients  correlation,  which  results  jointly 
from  the  low  regularity  of  the  Haar  system  and  from  a  particular  range  of 
/f,  is  no  more  encountered  for  more  regular  choices  (e.g.  any  Daubechies’ 
wavelet  [6]  such  that  R  >  2),  large  R's  leading  to  almost  uncorrelated 
coefficients  [27].  (A  simulation  illustrating  this  fact  is  given  in  [11].) 

3.2  Continuous  wavelets 

Given  a  fBm  of  index  H  and  its  continuous  wavelet  transform  Tbh  '^ith 
wavelet  h{t)  [17] 


(3.18) 


it  can  be  shown  that,  as  in  the  orthonormal  case,  time  stationarity  is  ob¬ 
served  at  any  scale  [8] 
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£{TB„{i,a)TBAs,a))  =  ^  {-  £" 

(3.19) 

as  well  as  scale  stationarity  at  any  time  [26].  Moreover,  it  turns  out  that 
the  second-order  stationarity  of  the  wavelet  transform  (together  with  the 
sel'  similarity  of  its  correlation  structure  from  scale  to  scale)  is  in  fact  a 
characteristic  property  of  fBm,  as  pointed  out  in  [25]. 

One  of  the  most  attractive  properties  of  the  continuous  wavelet  trans¬ 
form  is  that,  in  the  case  of  a  locally  self-similar  process  r(<),  the  small  scale 
behavior  of  the  increment  process  is  directly  mirrored  by  the  small  scale 
behavior  of  the  wavelet  transform  of  x(i)  [1],  namely 

f:((x(l  +  r)-x(<)n~lTp"(')  ,  r^O 


£(|r,(<,a)p) 


(3.20) 


Therefore,  if  we  want  to  measure  the  local  scaling  exponents  J/(t),  what 
is  required  is  a  procedure  ending  up  with 


^(t)=  -  lim 

/  «— O"*" 


log^(|r,(t.a)|^) 

logo 


(3.21) 


in  which  the  unavailable  ensemble  average  S  ^ITr(i,a)l^^  is  replaced  by 
some  efficient  estimate. 

It  has  been  observed  that  the  crude  estimate  |r,(t,a)|*  leads  to  a  high 
variability  for  ^(i),  suggesting  refined  analyses  such  as  maxima  tracking 
in  the  wavelet  representation  prior  to  local  least-squares  fits  "  amplitude  vs 
scale”  in  log-log  coordinates  [2].  Another  approach  is  to  consider  time-scale 
analysis  from  a  more  general  point  of  view,  wavelets  being  only  one  possible 
solution. 


4  Time-scale  energy  distributions 

The  purpose  of  a  time-scale  representation  is  to  describe  the  information 
contained  in  a  signal  in  terms  of  time  and  scale,  simultaneously  [10].  One 
useful  information  is  energy,  which  in  some  cases  can  be  obtained  by  inte¬ 
grating  the  representation  (then  referred  to  as  an  energy  disiribviion)  over 
the  whole  time-scale  plane.  This  is  especially  the  case  for  the  scalogram 
(i.e.  the  squared  modulus  of  the  wavelet  transform  [13])  since  we  have  for 
any  finite  energy  signal  x(i) 
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However,  it  turns  out  that  the  same  property  holds  for  other  distribu¬ 
tions  too,  as  discussed  below. 


4.1  A  general  class 

By  construction,  the  role  played  by  the  scalogram  in  time-scale  analysis  is 
very  similar  to  the  one  played  by  the  spectrogram  (i.e.  the  squared  modulus 
of  the  short-time  Fourier  transform)  in  time-frequency  analysis.  We  know 
that,  within  this  latter  framework,  a  much  larger  class  of  time-frequency 
energy  distributions  exist:  it  is  referred  to  as  the  Cohen’s  class  [4]  and  reads 


C,(<,i^:n)  =  J  J  M',(u,n)n(u -t,n  -  i/)dudn,  (4.2) 

where  Wx(t,i^)  is  the  so-called  W\gntr~Ville  distribution 


W, 


/  I  (<  -t-  -)  I  ( 

J 

Leo  ^  2; 

2/ 

(4.3) 


and  n(t,i')  some  arbitrary  parameterization  function  with  the  only  con¬ 


straint 


J  J  ll(i,i^)dtdu  1. 


(4.4) 


The  interest  of  Cohen’s  class  is  at  least  twofold.  First,  it  provides  a 
unification  of  different  definitions  which  are  simply  characterized  by  differ¬ 
ent  parameterization  functions.  Among  the  simplest  examples,  it  is  easily 
checked  that  the  specification  n(t,i')  =  M^s(t,t')  is  associated  with 


Cr(t,»/;lVfc)  =  [  i(u)fi*(u 

J^OQ 


7 


t)e“'*"'"du 


(4.5) 


i.e.  with  the  spectrogram.  This  latter  appears  therefore  as  only  a  special 
case  within  the  more  general  framework  of  Cohen’s  class.  Second,  its  pa¬ 
rameterization  provides  a  natural  way  of  deriving  specific  definitions  from 
requirements  imposed  a  priori. 

In  this  respect,  it  is  known  [9]  that,  for  a  number  of  theoretical  rea¬ 
sons  (marginal  properties,  localization  in  both  time  and  frequency,  esti¬ 
mation  of  instantaneous  frequency,...),  the  Wigner-Ville  distribution  is  to 
be  preferred  to  the  spectrogram  and  that,  for  more  practical  reasons,  a 
useful  approximation  of  it  is  the  so-called  smoothed  pseudo- Wigner-Ville 
distribution  whose  parameterization  function  is  separable,  i.e.  of  the  form 
11(1,1^)  =  g{i)H{u),  which  leads  lo  the  formulation 


J  J  -  t)x  (u  -f  x‘  (w  -  e 


dxidr. 

(4.6) 
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By  analogy  with  the  time-frequency  case,  it  is  expected  that  a  compan¬ 
ion  situation  could  exist  in  the  time-scale  case.  It  has  been  shown  that  a 
natural  counterpart  of  Cohen’s  class  does  exist  [13]  [26],  the  general  class 
of  time-scale  energy  distributions  being 

n,(<,a:n)  =  y  y  n)n  dudn.  (4.7) 

This  general  formulation  is  in  fact  deduced  from  the  time-scale  covari¬ 
ance  requirement 

^  ^  =  (4.8) 

imposed  to  bilinear  forms  [26]. 

Within  this  framework,  the  scaJogram  appears  as  only  a  special  case 
associated  with  the  choice  11(1, t/)  =  H4(t,i/),  exactly  as  the  spectrogram 
does  within  Cohen’s  class  and,  again,  other  choices  can  be  preferred  in 
some  circumstances.  This  can  be  the  case  for  the  frunily  of  affine  smoothed 
Wigner-Ville  disiribuitons  [13]  associated  with  the  separable  specification 
and  whose  expression  reads 

=  (ifl)  *  ("  +  j)  («  -  j)  dudr. 

(4.9) 

4.2  Time-scale  energy  distributions  and  self-similar  processes 

The  existence  of  a  whcle  family  of  time-scale  energy  distributions  offers  a 
great  versatility  which  does  not  necessarily  reduce  the  problem  of  estimat¬ 
ing  scaling  exponents  to  the  use  ofscalograms.  Loosely  speaking,  time-scale 
analysis  can  be  thought  of  as  a  natural  counterpart  of  time-frequency  analy¬ 
sis,  local  scaling  laws  playing  the  role  of  instantaneous  frequency.  Precisely, 
we  get  [16],  as  a  generalization  of  eq.  (24), 

£:([x(l  +  r)-z(f)]^^~|rp"W  ,  r-0 


=>  (4.10) 

f  (n,(l,a:n))~o^"(‘>-*-’  .  o-O 

for  all  the  distributions  fix  such  that  their  parameterization  function  11  has 
a  ID  partial  Fourier  transform 


(4.11) 


which  satisfies 
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(4.12) 

This  holds  in  the  case  of  the  scalogram  since,  then, 

x(c,..)  =  iy  +  !)//•  ^*^-0  =>  =  //(o//'(0)  =  o, 

(4.13) 

because  of  the  admissibility  condition  on  h{t).  This  holds  too  for  afBne 
smoothed  Wigner-Ville  distributions  such  that 

GiOM  (I)  =  0.  (4.14) 

This  latter  case  is  of  particular  interest  since,  by  controlling  indepen¬ 
dently  g(i)  and  h(t),  we  can  come  up,  at  will,  with  more  or  less  smoothed 
versions  (in  time)  of  scaiograms  [13].  Therefore,  given  one  recorded  signal, 
affine  smoothed  Wigner-Ville  distributions  can  be  viewed  as  estimators  of 
ensemble  averaged  scaiograms  for  which  the  scaling  property  (24)  holds. 
Within  this  interpretation,  the  above  condition  (39)  ensures  the  estimator 
to  be  unbiased,  whereas  the  time  smoothing  involved  in  the  estimation  re¬ 
duces  the  variability  of  ^(t)-  As  expected,  this  modified  procedure  leads 
to  effective  improvements  in  the  case  of  processes  whose  H(i)  is  a  piece- 
wise  constant  or  slowly-varying  function  [16]  and  its  effectiveness  for  more 
general  locally  self-similar  processes  is  currently  under  investigation. 

5  Conclusion 

Time-scale  analysis  is  a  powerful  framework  for  characterizing  self-similar 
processes.  In  the  case  of  global  self-similarity  (e.g.  fractional  Brown¬ 
ian  motion),  orthonormal  wavelet  decompositions  are  particularly  efficient 
whereas  the  study  of  local  self-similarity  has  been  preferably  conducted, 
up  to  now,  with  the  help  of  continuous  wavelet  transforms.  In  this  respect, 
and  with  the  idea  that  the  estimation  of  local  scaling  exponents  has  to  be 
performed  locally,  across  scales,  it  has  been  argued  that  time-scale  distri¬ 
butions  more  general  than  wavelet-based  distributions  can  be  used  with 
possibly  increased  performance. 

However,  it  has  been  recently  shown  [24]  that  local  scaling  properties  of 
a  (multifractal)  self-similar  process  can  be  directly  characterized  by  global 
scaling  properties  of  some  partition  function  built  upon  a  wavelet  trans¬ 
form.  In  this  respect  too,  it  is  an  interesting  question  to  study  how  such 
an  approach  could  be  combined  with  the  more  general  class  of  time-scale 
energy  distributions  discussed  here,  with  the  orthonormal  framework  (and 
its  computationd  efficiency),  or  even  with  sede-based  dynamied  models 
[3], 
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1.  Introduction 

A  fundamental  problem  in  discrete  signal  processing  is  to  find  a  numerical  representation 
that  is  well  adapted  in  order  to  perform  processings  such  as  compact  coding,  noise  removal, 
feature  enhancement,  pattern  detection  or  recognition.  Most  classical  methods  build  signal 
representation  with  linear  transforms,  generally  based  on  filtering  technics.  Some  linear 
transforms  such  as  the  Karhunen-Loeve  decomposition  or  wavelet-packets  [1]  can  be  adapted  to 
the  global  signal  properties.  However,  if  the  signal  includes  local  patterns  of  very  different  types, 
such  as  edges  and  sinusoidal  waves,  the  transform  will  not  be  adapted  to  at  least  one  of  these  type 
of  patterns.  Fig.  3(a),4(a),5(a)  show  examples  of  such  signals,  composed  of  a  sum  of  sinusoidal 
waves  of  different  frequencies,  plus  a  Dirac.  Depending  upon  the  relative  energy  of  the 
sinusoidal  waves  and  the  Dirac,  the  best  basis  algorithm  of  Coifman  and  Wickerhauser  [1]  will 
choose  cither  a  Dirac  representation  (Fig.  3(b))  or  a  sinusoidal  representation  (Fig.  5(b)),  or  an 
intermediate  representation  (Fig.  4(b))  A  remarkable  property  aspect  of  the  wave-packet  algo¬ 
rithm  is  that  it  chooses  an  orthogonal  basis,  however,  the  rigidity  of  this  orthogonality  can  yield 
instabilities  in  the  choice  of  the  basis. 

In  this  paper,  we  introduce  a  new  non-linear  operator,  that  defines  local  adaptive 
transforms.  Although  it  is  non-linear,  this  transform  has  an  energy  conservation  law,  as  an  orthog¬ 
onal  basis  decomposition.  The  signal  is  decomposed  into  elementary  structures,  that  match  best 
its  local  patterns.  These  structures  can  well  localized  functions  in  the  time/frcqucncy  plane,  or 
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other  basic  waveforms. 

Section  2  introduces  the  basic  theory  of  structure  books.  Section  3  describes  in  more  details 
the  particular  examples  of  structure  books  built  with  local  limc/frequency  transforms.  Numerical 
examples  and  signal  processing  applications  are  given  in  the  last  section. 


2.  Structure  Books 

This  section  defines  the  decomposition  of  signals  into  structure  books  and  the  properties  of 
this  decomposition.  The  signal  is  decomposed  into  a  sum  of  elementary  waveforms  that  belong 
to  a  given  set,  called  the  dictionary.  Depending  upon  the  dictionary,  the  resulting  transform  can 
have  very  different  properties.  Let  H  be  a  Hilbert  space.  Let  D  =  (e,),^/  be  a  family  of  normal¬ 
ized  vectors  that  belong  to  H.  The  family  of  vectors  D  is  called  a  dictionary.  The  index  set  I 
might  not  be  countable.  We  call  C  a  choice  function  of  D.  Such  a  function  associates  to  each 
nonempty  subset  £  of  D  an  element  ej  that  belongs  to  D,  and  we  write  C(£)  =  cj.  The  axiom  of 
choice  guarantees  that  such  a  function  exists. 

The  operator  T  subtracts  to  any  vector /of  H  its  projection  on  one  particular  vector  cj  of  D, 
that  is  chosen  among  the  ones  that  have  large  inner  products  with  /.  Let  a  be  a  constant  such  that 
0  <  a  <  1,  and 


S  =Sup 

iei 


!</.  e,>\ 


(1) 


Let  us  also  define 


£  = 


e,  G  D 


\<f ,  e,>\  >aS 


(2) 


Let  Cj  =  C  (£).  The  operator  T  is  defined  by 

Tf  =  f-<f,ej>  Cj  .  (3) 

It  follows  that 


ll/ll- =  lir/ll--!-  !</.  .  (4) 

To  perform  a  full  decomposition,  we  iterate  on  the  operator  T.  We  denote  7"  the  iteration  n  limes 
of  the  operator  7.  For  any  n  >  0  there  exists  e"  g  D  such  that 

7'’^'/=  7”/-  <7"/  .  c';>  c'l  .  (5) 

The  vector / can  be  decomposed  into 


m-\ 


/  =  ^  (7'’/- 7""'/)  +  7"'/. 

n=0 


where  T^f-f.  As  a  con.scqucr.ee  of  equation  (4',.  wt  obtain 


(6) 
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m-\ 

/  =  Z  <T’'f.e1>  e]  +  r"/  .  (7) 

Equation  (5)  yields  the  energy  conservation  equation 

m-l 

ll/ll 2  =  X  i<7'V.«">i^  +  •  (8) 

Let  us  denote  by  V  closure  of  the  space  of  vectors  that  are  linear  expansions  of  vectors  in  the  dic¬ 
tionary  D.  Let  W  be  the  orthogonal  complement  of  V  in  H.  Because  of  the  energy  conservation 
equation  (8),  one  can  prove  that  when  m  goes  to  +°o,  T"'f  converges  weakly  to  the  orthogonal 
projection  of /on  W.  We  say  that  the  dictionary  is  complete,  if  and  only  if  V  =  H.  In  tliis  case, 

W={  0}  and  hence  T"*/  converges  weakly  to  0.  This  means  that  the  sum  <7'"/.  e;>  e] 

n=0 

m-] 

converges  weakly  to/.  In  general,  the  type  convergence  of  Y,  <7'"/.  £]>  e]  depends  upon  the 

/i=0 

properties  of  the  dictionary  D.  A  detailed  analysis  of  this  convergence  is  under  preparation  [3].  In 
this  short  paper,  we  rather  concentrate  on  the  practical  applications  of  this  transform  to  discrete 
signal  processing. 

In  signal  processing,  we  manipulate  finite  discrete  signals  that  can  be  considered  as  ele¬ 
ments  of  a  finite  dimensional  space  H.  One  easily  can  prove  [3]  the  following  theorem  that 
guarantees  that  for  finite  discrete  signal,  a  structuring  transform  provides  a  complete  and  stable 
signal  representation. 


Theorem 

If  the  space  H  has  a  finite  dimension,  T"/ converges  in  norm  to  the  orthogonal  projection  of  / on 
W  If  the  dictionary  D  is  complete,  then 


/  =  I  <T7,e;>e;  , 

*  n=0 

and 


(9) 


||/||2  =  2:  l<7’7.  e]>\^  . 

n=0 


(10) 


We  call  a  structure  the  information  given  by  (e"  ,  <7"/,  <?">)  and  structure  book  the 


sequence  of  structures 


(e;,  <77.e;"»)J 


This  theorem  proves  that  a  structure  book  is  a 


complete  and  stable  representation,  of  finite  signals.  In  finite  dimension,  an  example  of  choice 
function  is  implemented  by  selecting  the  first  vector  encountered  in  the  dictionars'  data  base,  that 
satisfies 
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I  </,  e,>  I  -Max 
^  16/ 


I  </.  e,>  I 


(11) 


3.  Time/Frequency  Dictionaries 

The  dictionary  that  is  used  to  build  the  structure  book  must  be  adapted  to  the  class  of  sig¬ 
nals  that  is  considered  and  to  the  particular  applications.  Local  time/frequency  dictionaries  are 
often  well  adapted  to  characterize  patterns  cf  very  different  types.  To  create  a  signal  representa¬ 
tion  that  is  invariant  by  translation,  the  dictionary  must  be  composed  of  elementary  waveforms 
that  are  translated  on  the  signal  grid.  Let  us  suppose  that  the  signals  have  P  samples,  and  are 
periodized  to  solve  border  problems.  Let  (e*)i  be  a  set  of  elementary  discrete  waveforms 
and  ci^p  by  the  translation  of  e*  by  p; 


=  Ck(,n-p) . 


We  build  the  dictionary'  D  = 
structure  book  given  by 


J 


Suppo.se  that  a  signal  /  is  decomposed  into  a 


(e2.p,<7V.<p>) 


ne  N 


If  the  signal  is  translated  by  /  samples,  one  can  easily  prove  that  the  waveforms  of  the  structure 
book  are  translated  out  not  modified  and  are  given  by 

[(.elp^i ,  <T’'f ,  elp») 

In  the  following,  we  call  shifting  dictionary,  any  dictionary  that  is  built  by  translating  a  set  of  ele¬ 
mentary  waveforms.  One  can  easily  prove  that  the  dictionary  is  complete,  if  and  only  if  there 
exists  two  constants  /I  >  0  and  B  >  0,  such  that  for  all  frequencies  co,  their  discrete  Fourier 
transform  e*(co)  of  the  signals  e*  satisfy 

K 

A  <  iai((0)|2  <  B  .  (12) 

*=i 


The  wavelet  transform  and  mulii.scalc  window  Fourier  transform  provide  two  particularly 
interesting  local  time/frcquency  dictionaries.  If  we  want  a  representation  that  is  invariant  by  scal¬ 
ing  by  any  power  of  j  >  1,  we  build  a  dictionary'  that  is  compo.scd  of  wavelets  dilated  by  s'", 
m  e  Z,  and  translated  on  the  signal  grid.  In  this  ca.se,  the  structure  book  corresponds  to  a  non¬ 
linear  wavelet  transform.  The  signal  /  is  dccompo.scd  into  a  sum  of  wavelets  translated  and 
dilated.  Although  it  is  not  an  orthogonal  wavelet  tr.msform,  equation  (10)  proves  that  we  have  a 
conservation  of  energy. 
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To  compute  a  discrete  representation  which  is  invariant  by  translation,  dilation  and  modula¬ 
tion,  we  define  a  dictionary  by  dilating,  translating  and  modulating  a  window  function.  The 
structure  book  defines  a  signal  decomposition  that  is  local  in  the  phase  plane.  It  regroups  the 
waveforms  of  the  time/frequency  plane,  that  match  best  the  signal  patterns. 

The  general  algorithm  that  computes  a  structure  book  is  illustrated  by  the  block  diagram  of 
Fig.  1.  The  algorithm  requires  to  find  a  waveform  in  llie  dictionary,  whose  inner  product  (corre¬ 
lation)  belongs  to  the  set  E  defined  in  (2).  For  this  purpose,  we  build  a  data  base  of  correlation 
coefficients  that  carries  the  correlation  of  the  signal  with  each  waveform  of  the  dictionary.  The 
inner  product  of  a  signal  with  a  translated  waveform  can  be  written  as  a  convolution.  For  a  shift¬ 
ing  dictionary,  this  data  base  is  thus  computed  by  convolving  the  signal  /  with  each  elementary 
waveform,  which  can  be  done  with  fast  Fourier  transform.  We  then  choose  one  correlation  coeffi¬ 
cient  whose  absolute  value  is  maximum,  subtract  the  corresponding  waveform  to  the  signal,  as  in 
equation  (3)  and  update  the  correlation  coefficients.  This  operation  is  repeated  unii',  the  total 
energy  of  the  selected  structures  is  close  to  the  signal  energy. 

Shifting  dictionaries  have  the  advantage  of  building  signal  representations  that  arc  shift 
invariant,  but  they  require  substantial  amounts  of  computations.  The  wavepackcis  [1]  define  dic¬ 
tionaries,  where  the  structure  book  can  be  computed  efficiently.  Orthogonal  wavcpackcts  arc 
computed  through  a  cascade  of  Quadrature  Mirror  Filter  bank  decompositions  [4],  along  a  tree, 
where  each  node  has  M  sons,  generally  2  for  one-dimensional  signals,  as  illustrated  by  Fig.  2.  If 
we  put  a  Dirac  at  the  root  of  this  tree,  the  discrete  waveforms  that  arc  generated  at  the  nodes  of 
the  tree  arc  the  wavepacket  functions.  The  corresponding  dictionary  is  a  set  of  translated  versions 
of  these  elementary  waveforms.  These  wavcpackcts  can  be  regrouped  into  a  set  of  orthonormal 
bases.  Coifman  and  Wickerhauscr  [1],  showed  that  the  wavepacket  tree  provides  a  ’ocal  signal 
decomposition  in  the  limc/frcqucncy  plane.  As  opposed  to  the  best  basis  algorithm,  we  extract 
from  this  tree  a  set  of  elementary  waveforms  that  are  not  mutually  orthogonals  but  that  can  match 
locally  the  signal- patterns.  For  wavcpackcts,  the  update  of  correlation  coefficients  can  be  imple¬ 
mented  efficiently  with  finite  impulse  response  filters  as  the  ones  of  Daubcchics  [2].  The  selec¬ 
tion  of  the  correlation  coefficients  is  done  by  using  hash  tables  [3].  After  processing  the  structure 
book,  the  reconstruction  of  a  signal  from  a  structure  book  is  simply  done  by  adding  the 
waveforms  of  the  structure  book,  as  indicated  by  equation  (9).  The  same  type  of  compulations 
can  be  performed  for  signals  of  any  dimcn.sion.  For  wavepacket  dictionaries,  this  algorithm  can 
be  efficiently  implemented  for  images. 
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4.  Numerical  Experiments  and  Signal  Processing  Applications 

This  section  presents  numerical  results  for  wavcpackcts  dictionaries  as  well  and  signal  pro¬ 
cessing  applications.  In  order  to  give  a  better  understanding  of  the  structure  book  properties,  we 
compare  the  structure  book  with  the  optimal  wavcpackci  basis.  Fig.  3,4  and  5  show  each  of  these 
transforms  for  a  Dirac  embedded  into  a  sum  of  sinusoidal  waves.  The  wavcpackct  makes  a  global 
time/frequency  localization  choice  that  is  controlled  ciilier  by  the  Dirac  of  the  sinusoids.  On  the 
contrary,  the  waveforms  in  the  structure  book  arc  locally  adapted  to  the  signal  properties.  The 
structures  therefore  match  the  signal,  independently  from  the  relative  global  energy  of  the  Dirac 
and  the  sinusoidal  waves,  as  it  can  be  seen  in  Fig.  3(c),  4(c)  and  5(c).  Each  block  shown  in  these 
figures  corresponds  to  a  structure.  The  darker  the  block,  the  larger  the  correlation  coefficient. 

The  dictionary  waveform  of  a  signal  structure  can  be  viewed  as  a  signal  pattern  and  the 
correlation  coefficient  as  the  amplitude  of  this  pattern  in  the  signal.  This  signal  representation  has 
applications  for  pattern  detection,  recognition,  noi.se  removal,  signal  enhancement  and  compact 
coding.  We  illustrate  the  application  to  pattern  detection  and  noise  removal  with  several  exam¬ 
ples.  Fig.  6  shows  a  simple  example  of  signal  separation.  We  know  a  priori  that  a  Dirac  yields 
highest  correlation  coefficients,  for  a  wavelet  time/frcquency  localization.  Fig.  6(b)  shows  the 
structures  of  the  whole  structure  book  given  in  Fig.  5(c),  that  have  a  wavelet  type  time/frequcncy 
localization.  Fig.  6(c)  is  the  graph  of  the  signal  reconstructed  from  the  selected  structures  shown 
in  Fig.  6(b).  The  Dirac  is  clearly  restored,  with  little  remaining  oscillatory  component.  Fig.  7(a) 
shows  a  Gabor  function  (Gaussian  modulated  by  a  sinu.soidal  wave).  Fig.  7(b)  is  the  sum  of  this 
Gabor  funetion  with  a  white  noise.  The  total  energy  of  the  noise  is  much  larger  than  the  energy 
of  the  .signal.  Fig.  7(c)  shows  the  best  basis  selection,  which  is  completely  driven  by  the  white 
noise.  Fig.  7(d)  shows  the  structure  book,  where  the  Gabor  function  clear  appears  as  a  very  dark 
box.  This  Gabor  function  is  easily  discriminated,  becau.se  its  energy  is  much  more  concentrated 
than  the  energy  of  the  white  noi.se.  In  Fig.  8(a),  the  input  signal  is  a  .set  of  two  Diracs,  that  can  be 
viewed  at  radar  impulses,  for  example.  To  the,sc  iw'o  Diracs  is  added  a  color  noised  shown  and  llie 
sum  is  shown  in  Fig.  8(b).  The  Signal  to  Noise  Ratio  is  -15.09  db.  Fig.  8(d)  shows  the  structure 
book  of  this  signal.  The  struetures  of  larger  energy  are  noise  components.  However,  one  can 
recognize  two  elongated  bars  due  to  the  Diracs.  As  previously  mentioned,  Diracs  like  any  singu¬ 
larities,  yield  structures  whose  timc/frequency  localization  arc  wavelet  type  time/frequency  loeal- 
ization.  To  restore  the  impul.se  components  of  the  signal,  we  thus  select  the  structures  of  the 
structure  book,  that  have  a  wavelet  type  time/frequency  decomposition.  These  structures  are 
shown  in  fig.  8(e).  The  signal  reconstructed  from  ihe.se  .structures  is  shown  in  Fig.  8(c).  The 
Diracs  have  been  partly  destroyed  but  are  now  clearly  visible.  Let  us  emphasize  that  this  algo¬ 
rithm  works  well  becau.se  the  noi.se  has  a  different  typo  of  time/frcquency  localization  than  the 
signal.  .After  the  noi.se  removal,  the  SN"R  is  0.21  db.  which  represents  a  gain  of  -15.2  db. 
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The  structure  book  representation  can  be  used  for  applications  such  as  speech  recognition, 
where  we  know  perfectly  what  is  the  information,  but  also  to  problems  where  we  try  find  whether 
there  is  any  "information"  in  a  signal,  and  how  to  extract  it.  There  are  many  such  issues  in  medi¬ 
cal  signal  processing,  where  new  sensors  such  as  high  resolution  blood  pressure  sensors,  yield 
measurements  that  we  do  not  know  how  to  interpret. 
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Fig.l  Algorithm  to  compute  the  structure  book  of  a'signnl. 


Fig.3(b)  Time/frequency  representation  of  the  the  best  basis. 


Fig. 3(c)  Time/frequency  representation  of  the  structure  book.  Each  rect¬ 
angle  represents  a  particular  structure  which  indexed  by  its  position  in  the 
nhnse  r)lanc.  The  darker  the  rertancle.  the  Inrtrer  the  correlation  coefficient. 


Fig.4(a)  Dirac  with  sum  of  sunusoides.  (dirac  amplitude  is  12) 


Fig.4(b)  Time/trequency  representation  of  the  the  best  basis. 


Fig.4(c)  Time/frequency  representation  of  the  structure  book. 


Fig. 5(c)  Time/frequency  representation  of  the  structure  book. 
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Fig. 6(a)  Dirac  with  sum  of  sinusoids  (same  as  in  Fig.5(a)). 


Fig. 6(b)  Selected  structures  from  the  structure  book  shown  in  Fig. 5(c). 


Fig.6(c)  Reconstructed  signals  from  selected  structures. 
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Fig. 7(a)  Gabor  function  (Gaussian  modulated  by  a  sinusoideJ  wave). 


Fig. 7(b)  Gabor  function  plus  white  noise. 


■  . 


Fig.7(c)  Time/frequency  representation  of  the  best  basis. 


Fig.7(d)  Time/frequency  representation  of  the  structure  book. 
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Fig. 8(e)  Time/frequency  representation  of  the  selected  structures  from 


166  AFIT/AFOSR  Wavelets  Workshop 


ON  VARIABLE  LENGTH  WINDOWS  AND  WEIGHTED 
ORTHONORMAL  FUNCTIONS 

Bruce  W.  Suter  and  Mark  E.  Oxley 

Air  Force  Institute  of  Tcclinology 
Wright-Patterson  AFB,  OH  45433 

Abstract 

A  new  formulation  is  presented  for  the  analysis  and  synthesis  of  signals.  This  formulation 
is  composed  of  a  variable  width  window  and  a  linear  combination  of  weighted  orihonormal 
functions.  Tradeoffs  in  the  specification  of  windows  are  examined.  A  sinusoidal  example  is 
considered,  and  a  fast  algorithm  is  provided  for  its  evaluation. 


This  work  was  sup})ortcd  in  part  by  tlie  Air  Force  Office  of  Scientific  Research  under  Grant 


No.  AFOSR-G 1 6-92-001  h. 
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I.  Introduction 

In  short  time  Fourier  analysis,  a  signal  is  multiplied  by  a  window  and  then  the  Fourier 
transform  is  computed  (see,  for  example,  Oppenheim  and  Schaefer  [1]).  The  result  of  this 
transformation  is  not  uniquely  defined  unless  the  window  is  specified.  Towards  this  end, 
Harris  [2]  provides  an  encyclopedic  presentation  on  windows. 

These  windows  can  be  chosen  to  offer  a  great  deal  of  flexibility  for  the  user,  but  any 
windowing  process  inevitably  limits  the  accuracy  of  real  time  spectral  estimation.  In  an 
effort  to  overcome  some  of  these  limitations,  Princen  and  Bradley  [3]  presented  a  technique 
that  utilized  a  basis  made  from  the  product  of  a  window  and  a  sinusoidal  function.  The 
generality  of  their  results  was  limited  by  the  assumptions  that  the  windows  would  be  of 
constant  length  and  would  have  fifty  percent  overlap  with  adjacent  windows. 

Using  an  approach  which  is  conceptually  similar  to  the  earlier  work  of  Princen  and 
Bradley,  Ca.ssereau  [4]  introduced  and  Malver  [5]  further  investigated  a  technique  called  the 
Lapped  Orthogonal  Transform  (LOT).  Although  Malvar  also  constrained  the  windows  to  be 
of  constant  length  and  to  have  fifty  percent  overlap,  he  [6]  was  able  to  obtain  perceptual 
improvements  in  the  coding  of  speech  through  the  “elimination  of  noise  (extraneous  tones)” 
that  was  associated  with  the  edge  effects  of  traditional  windows.  Recently,  Akanasu  and 
Wadas  [7]  applied  Malvar’s  LOT  to  the  coding  of  images  and  found  that  the  energy  com¬ 
paction  of  LOT  to  be  superior  to  block  transforms  for  all  cases  considered.  Generalizing  the 
work  of  Malvar,  Coifman  and  Meyer  [S]  provided  conditions  for  windows  of  variable  length. 

Building  on  this  body  of  knowledge,  a  more  generalized  formulation  is  presented 
for  the  analysis  and  synthesis  of  signals.  This  formulation  involves  a  family  of  weighted 
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orthonormal  bases  of  functions  and  a  family  of  windows.  Each  window  may  vary  in  length, 
in  peak  amplitude,  and  in  percent  overlap  with  other  windows.  The  following  discussion  will 
be  limited  to  one  dimensional  functions,  but  the  extension  to  multidimensional  results  can 
be  achieved,  in  a  straight-forward  manner,  by  representing  higher  dimensional  functions  as  a 
tensor  product  of  unidimensional  functions,  as  it  is  usually  done  in  transform  image  coding 

[Si¬ 

ll.  A  New  Paradigm 

We  begin  by  partioning  the  Real  line  R  with  the  strictly  increasing  sequence 

I 

{fljlj  G  Z)  so  that  R  =  Ujgz[<^ji For  each  i  €  Z  let  E  N}  denote  a  real 

weighted  orthonormal  basis  defined  on  the  interval  Ij  =  [a_,-,aj+i]  where  orthogonality  is 
measured  with  respect  to  the  weight  function  Pj{x),  that  is, 

Jaj 


At  each  point  we  center  an  interval,  namely  [cj  —  ej,a^  -!-  e^]  with  tj  >  0.  And  to 
guarantee  that  this  interval  does  not  overlap  with  the  interval  centered  at  we  require 
tj+i  +  tj  <  Oj  +  i  —  Uj-  Observe  the  redundancy  with  the  overlapping  intervals  and  R  = 
U,ez[“j  —  -f  Cj+i].  VVe  define  the  extensions  /j.t  by  constructing  the  odd  extension  of 

fjk  on  {uj  —  and  the  even  extension  of  /^./t  on  Specifically,  can 
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be  expressed  as 

0  ,  — oo  <  X  <  Uj  —  e, 

,  Uj-tj  <x  <  (ij 

JjA^)  =  '  /j,fc(.T)  ,  <  X  <  rtj  +  i 

./j.t x)  ,  flj+i  ^  X  <Z  "b  (-j+i 

0  ,  +  fj+]  <  X  <  oo. 

Let  />j  denote  the  even  extensions  of  about  botli  endpoints,  specifically, 

/ 

0  ,  — cx)  <  X  <  —  Cj 

■”  x)  )  dj  €,j  ^  X  ^  flj 

Pj{x)  =  <  Pj{x)  ,  aj  <  X  <  Cj+i 

Pj(2^j+i  )  ®j+l  ^  d"  ^j+l 

0  ,  «_,+)  4-  Cj+i  <  X  <  oo. 

To  simplify  notation,  let  gj.k{x)  =  yjp}{x)fj,k{x). 

Let  iVj{x)  denote  the  window  function  supported  on  the  interval  (uj  —  c^,  «j+i  +  Cj+] ) 
having  a  peak  amplitude  of  Aj.  The  amplitude-normalized  window  u’j(x)  is  given  by 
ihj{x)  =  Wj[x)lAy  VVe  choose  amplitude-normalized  windows  xOj[x)  with  the  following 
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properties  (see  Coifman  and  Meyer  [8]): 


(a) 

Wj{x) 

=  1 

for 

X  G  (Gj 

(ft) 

Wj{x) 

=  0 

for 

X  ^  {Uj  —  Cj,  Gj  +  i  +  f-j  +  l) 

(«=) 

Wj{aj  -  .s) 

=  A- s) 

for 

s  G 

id) 

^^{x)  +  w]_i{x) 

=  1 

for 

T  G  [Gj  -  Cj,  Gj  +  Cj] 

Now  we  form  new  functions  which  are  su])i)ortcd  on  tlie  interval  (a^  —  +  Cj+i) 

for  eacli  /:  G  N.  Each  function  is  the  product  of  the  amplitude-normalized  w'indow  ihj{x) 
and  the  symmetric  extension  that  is,  =  Wj{x)gj^f;{^).  A  proof  of  the  theorem 

that  {wj.fcli  £  £  N}  is  an  orthonormal  basis  for  Z-^(R)  is  given  in  Sutcr  and  Oxley 

[11], 

III.  Continuously  Differentiable  Orthonormal  Functions 

If /j.jt  is  chosen  to  be  continuously  differentiable  on  (a_,,  Gj+j)  then  the  extension  fj  k  may  not 
be  continuously  differentiable  on  {aj  —  Cj,aj+i  +  Cj+i)-  From  the  signal  processing  point  of 
view,  “noise”  may  be  generated  as  a  result  of  piecing  together  /j,a.(x')  with  its  odd  extension 
at  X  =  Gj  and  even  extension  at  x  =  n_,+].  To  minimize  this  “noise”  the  following  regu¬ 
larity  conditions  are  imposed  on  at  the  endpoints  to  guarantee  that  is  continuously 
differentiable  on  (g_,  —  C;, r  -(-  Cj+i). 

(^)  /j.t(aj)  =  0  and  exists 


+  exists  and  =  0. 
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As  an  example,  assume  has  the  form  of  a  sinusoidal  function  on  [a^,  0^+]],  in  particular, 


L^j+i 


where  Aj^k  and  Uj^k  are  to  be  determined.  Bj-  construction  fjk{aj+i)  =  0  and  both 
and  fj^ki^j)  exist.  The  condition  /j  fc(«j+i)  =  0  yields 


J.k  (  - — -  )  cos  =  0 

\^j+i  ~  o,j  ) 

hence,  choose  wyt  =  7r(A:  -f-  1/2)  for  each  ^  G  N  and  j  G  Z.  We  choose  A^^k  such  that 
has  unit  energy  norm,  so 


A^^sin^  ^7r(/:+ 1/2) 


X  ~  tti 


dx  =  A 


2  f  Qj+1 

j.k  \  0 


=  1 


yields  the  orthornormal  basis 


fjA^)  = 


cij+i  —  a. 


sin  7r(/c+l/2) 


X  —  a. 


C£-)4-i  Cl  t 


L^j+1 


Ccifman  and  Meyer  presented  this  orthonormal  basis  in  their  recent  paper  [8].  Notice  that 
the  weight  function  Pj[x)  =  1  for  j  G  Z  htis  a  continuously  dihcrentiable  extension  Pj(t). 

IV,  An  Example  of  Windows 


In  section  II,  conditions  were  given  for  the  amplitude-normalized  window  function  U',(.r). 
In  this  section,  we  give  an  exainjde  of  u’j(.t).  Assume  the  form  of  window  in  the  interval 
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[a_,+i  —  +  Ej+i]  to  be 


w 


’j(x)  =  cos{Bj{x  —  [aj+i  —  Cj+i]})  for  Uj+i  —  c^+i  <  x  <  Oj+i  + 


which,  by  construction,  satisfies  the  condition  ihj(aj+i  —  e_,+i)  =  1.  At  the  other  endpoint, 
we  require  ?hj(aj+i  +  Cj+i)  =  0.  Hence, 

cos{Bj2cj+i)  =  0 

implies  the  choice  Bj  =  7r/4£j+i  for  each  j  G  Z.  therefore 

Wj{x)  =  cos  -  [«i+i  -  ^i+il)l  for  “i+i  ~  ^i+i  ^  <  aj+i  +  Cj+i. 


i+i 


Invoking  symmetry,  the  window  function  becomes 


0 


Wj{x)  =  ■{ 


K  -  ^jD) 


,  — OO  <  X  <  Uj  —  Cj 

,  Oj  —  tj  <  X  <  Cj  +  Cj 
1  ,  CLj  d"  Cj  ^  ^  ^  ^j  +  1  ^j  +  t 

0°^  (4!^  ~  ~  ’  ®j+i  —  Cj+i  <  X  <  aj_+]  +  tj+i 

0  ,  Cj+i  +  Cj+i  <  X  <  00. 


The  advantage  of  this  window  is  its  simple  implementation.  The  disadvantage  of  this  window 
is  that  it  is  not  differentiable  at  x  =  —  Cj  and  x  =  cij+i  +  e,+i .  Examples  of  windows  which 

are  differentiable  at  the  endpoints  are  given  in  Suter  and  Oxley  [11]. 
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V.  Coefficient  Evaluation 

Let  s(a;)  be  a  measured  signal  with  finite  energy,  that  is,  s  c  Z/^(R).  Expanding  s{x)  in 
terms  of  the  orthonormal  basis  G  Z,  €  N}  yields 

jez  fceN 

where  Oj,*,-  =  I'he  coefRcients  (^an  be  rewritten  as  (see  [11]  for  the  details) 

Jaj 

where 

/ 

s(x)iDj(x)  —  s{2aj  —  x)wj{2aj  —  x)  ,  ctj  <  x  <  Oj  +  €j 

hj{x)  =  <  5(x)tZ;_,(x)  ,  Qj  +  tj  <  X  <  Qj+i  —  e_,+i 

s(x)ihj(x)  s{2aj+i  —  x)wj{2aj+i  —  x)  ,  a^+i  —  Cj+i  <  x  <  aj+]. 

VI.  Coefficient  Evaluation  for  Sinusoidal  Functions 

This  section  is  a  summary  of  the  algorithm  required  to  generate  the  coefficients  of  input  data 
function  usin^  ihe  sinusoidal  basis  example  of  section  Ill,  The  derivation  of  the  following 
algorithm  is  provided  in  [llj. 

Assume  that  the  signal  5(x)  is  sampled  at  the  rate  ^  >  0.  Let  Uj  be  chosen  at  the 
sampled  x’s  so  that  a_,+]  —dj  =  6Nj  where  Nj  is  a  positive  integer.  Thus,  there  are  Nj  samples 
taken  in  the  interval  («j,aj+i],  and  we  denote  Xjj  =  cij  +  61  for  /  =  0, 1,  Choose  Cj  = 

SMj  where  Mj  is  also  a  jiostive  integer.  The  nonoverlapping  condition,  tj+]  +  Cj  <  Uj+i  — 


( 
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implies  Mj+i  +  Mj  <  For  each  j  G  Z  we  perform  the  following  steps: 

(1)  Obtain  data  s{xjj)  for  —  Cj  <  xjj  <  a_,+i  -|-  tj+i, 
that  is,  /  =  —Mj, 0, 1, Nj  +  A/j+i . 


(2)  Multiply  -‘^{xjj)  by  window  ihj(xjj)  for  /  =  — 0, 1, +  Mj+i. 


(3)  Fold  in  the  sequence  by  defining 


H,,,  =  < 


s{xjj)w{x^^i)  -  s(2aj  -  Xj,i)wj{2aj  ~  Xjj) 

s{Xj,i)Wj{Xj,l) 


,  Oj  <  Xj,i  <  Oj  +  Cj 
,  cij  -f  Cj  ^  Xjj  <  <^j+i  0+1 


”h  5(2Qj+i  ^j,i}wj(2cj^i  5  ®j+i  0+1  —  —  ^j+i 

\ 

for  /  =  0, 1, 

(4)  Define  new  array 


/?j,;  =  Hjj  sin 


ttI 


for  /  =  0, 1, . . . ,  A^j  . 


(5)  Define  the  even  extension  of 


hi  = 


Ihi 

Pj.7N-l 


^  0  <  I  <  Nj-\ 

,  A',  <  /  <  2A^j  -  1. 


(6)  Perform  an  FFT  of  length  2Aj  on  0jj  (see  for  example, Ferguson  [10])  and  define 


I 


3  1=0 


8 
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(7)  Interpret  the  results  of  the  FFT 


+  2sj26NjDj,k 


aj)^  approximates  the  coefficients  aj,*,-. 

VII.  Reconstruction  of  Signal  Using  Sinusoidal  Functions 

In  this  section  we  give  a  summary  of  the  algorithm  to  reconstruct  the  signal  using  the 
sinusoidal  basis  example  of  section  III.  The  derivation  of  this  algorithm  is  provided  in  [11]. 

Assume  we  have  the  set  of  coefficients  {aj./tji  €  Z,/:  €  N}.  We  wish  to  reconstruct 
the  signal  s(i)  at  the  values  of  Xj^i  =  aj  +  61  for  each  i  €  Z  and  /  =  0, 1, ...,  Nj.  (See  section 
VI.)  For  each  j  €  Z  perform  the  following  steps; 


(1)  Define  odd  extension  of  Oj.t  for  /c  =  0, 1,2, . . . ,  Aj  —  1  by 

a,,k  ,  0  <  /  <  A,  -  1 

—0:},2Nj-k-i  ,  Aj  <  /  <  2Nj  —  1 

(2)  Perform  a  I'TT  of  length  2Aj  on  for  k  =  0, 1, 2, . . . ,  2Aj  —  1  and  produce  the 


Oj.A-  =  < 


sequence 


II jj  =  .j62NjIin  ^c'^^ 


2Nj  - 1 


2rkl 


9  A  ^ 

“■'D  k=o 

(.3)  Reconstruct  the  signal  on  the  interval  at  the  data  points  Xjj  =  (ij  +  61  for 

/  =  0,1,2,  ...,A'j  by  using 


9 
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H,-u 


1 


S:,l  = 


Wj{2aj  -  ,  aj  <  <  a_,  +  Cj 

,  Gj  +  Cj  <  a’jj  <  <^j+i  ~  ^j+i 


IG, 


The  values  Sj,/  will  approximate  the  signal  evaluated  at  Xj^i,  that  is,  5j_/  •s(.Tjy). 

VIII.  Conclusions 


A  formulation  was  presented  for  the  analysis  and  synthesis  of  signals.  This  formulation 
permitted  (a)  a  variable  window  length,  (b)  a  variable  percent  overlap  between  windows, 
and  (c)  an  arbitrary  ortiionormal  basis  inside  the  analysis  interval.  A  sinusoidal  example 
was  examined  and  a  fast  algorithm  were  provided  for  both  coefficient  evaluation  and  signal 
reconstruction.  It  is  important  to  note  the  complexity  of  both  the  spectrum  generation  and 
reconstruction  are  of  the  same  complexity  as  the  Fast  Fourier  Transform.  Future  planned 
work  include  (a)  utilization  of  polynomial  ba^is  functions  and  (b)  application  of  this  general 
approach  to  the  analysis  of  speech  signals  and  the  synthesis  of  computer  graphics. 
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MULTI-RESOLUTION  ESTIMATION  FOR 
IMAGE  PROCESSING  AND  FUSION 

Roben.  R.  Tenney^ 

Alan  S.  Willsky2 

This  paper  introduces  a  set  of  estimation  algorithms  based  on  multi-resolution  models  of 
random  The  models  of  interest  include  statistical  representations  of  terrain  and  other 
geophysical  phenomena.  Their  structure  employs  a  series  of  successively  finer  representations  of 
the  process.  The  estimation  algorithms  exploit  ^is  smicture  to  combine  information  from  different 
areas,  perhaps  at  different  resolutions,  into  updated  estimates  of  the  process  variables.  The 
algorithms  extend  recent  work  on  estimation  for  stochastic  tree  processes;  the  descriptions  of  the 
images  of  interest  require  a  somewhat  more  general  smicture  than  existing  tree  processes  permit 
We  present  a  general  theory  of  estimation  on  acyclic  graphs,  and  specialize  the  results  to  one  and 
two  dimension  processes  where  the  scale-to-scale  relationships  are  midpoint  deflection 
processes.  Applications  of  this  work  will  include  anomoly  detection,  change  detection, 
segmentation,  and  reconstruction  with  data  from  imaging  sensors. 
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OVERVIEW 

Real-time  video  sensor  technology  has  become  quite  mature,  and  therefore  relatively 
inexpensive  and  widely  used.  Computing  power  continues  to  drop  in  cost,  particularly  when  an 
algorithm  can  be  structured  to  march  massively  parallel  architectures.  However,  algorithm 
technology  has  not  yet  matured  to  the  point  where  that  computing  power  can  be  applied  to  real-time 
video  processing  for  much  more  than  simple  image  enhancement. 

In  addition,  newer  imaging  techniques  exploit  very  different  physical  phenomena  which, 
while  not  providing  the  clarity  expected  from  video  sensors,  can  provide  information  about 
important  phenomena  that  otherwise  would  be  extremely  difficult  to  observe.  The  lower  signal  to 
noise  ratios  of  these  sensors  demand  algorithms  based  on  statistical  and  djmamic  techniques, 
instead  of  the  prevalent,  largely  deterministic  tq>proaches  to  feamre  extraction  and  identification. 

The  purpose  of  this  effon  was  to  develop  efficient  estimation  algorithms  to  recover 
important  information  from  real-time  video  imagery  -  information  which  is  vital  to  subsequent 
monitoring  or  control  applications.  To  achieve  this  goal,  the  algorithms  must  meet  five  criteria. 
First,  they  should  have  an  efficient  computational  structure,  measured  in  terms  of  the  total  number 
of  numerical  operations  required.  Second,  they  should  have  a  high  degree  of  structural  regularity’ 
which  can  be  mated  with  parallel  computing  architecmres.  Third,  they  should  be  based  on  explicit 
models  of  the  physical  processes  which  generate  the  imagery  of  interest.  Fourth,  those  models 
should  be  statistical  in  nature,  as  Bayesian  probability  theory  remains  the  most  complete 
mathematical  representation  of  uncertainty.  Finally,  the  point  of  departure  for  this  work  is  that 
those  models  have  a  multi-scale  structure  —  where  large  structures  find  representation  at  coarse 
scales  of  the  model,  and  local  structures  appear  at  the  finer  scales. 

The  mathematical  approach  taken  here  is  more  general  than  necessary,  in  order  to  remove 
any  dependencies  on  the  particular  topology  of  the  multi-scale  model.  The  toic  estimation  theory 
derives  from  the  properties  of  Bayesian  networks,  where  a  network  representation  highlights  direct 
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starisncal  connections  among  an  arbitrary  set  of  variables.  The  application  of  the  basic  theory  to 
image  processing  then  becomes  an  exercise  in  representing  the  scale-to-scale  dependencies  of  an 
image  in  terms  of  these  networks. 

The  most  evocative  application  of  the  class  of  models  investigated  here  is  in  environmental 
or  terrain  reconstruction.  The  same  models  studied  here  have  been  widely  used  to  generate 
synthetic,  fractal  landscapes  in  the  training  and  entertainment  industries.  A  slight  change  in  a 
generating  function  permits  synthesis  of  images  with  a  granular  or  crystalline  structure.  This  work 
two  questions:  1)  are  multi-scale  models  able  to  represent  real  physical  processes  of 
interest?,  and  2)  do  they  lead  to  computationally  efficient  estimation  algorithms?  T^e  answer  to 
both  is  “yes”,  and  the  major  product  of  this  work  is  a  foundation  for  algorithms  which  estimate  the 
parameters  which  characterize  the  process. 

OBJECTIVES 

Conventional  image  processing  techniques  operate  directly  on  the  spatial,  pixel-level  data  of 
an  image.  Alternative  techniques,  inspired  by  the  multi-scale  functional  bases  of  ^e  affine  wavelet 
transform,  represent  the  image  at  successively  finer  levels  of  detail,  and  allocate  processing  power 
across  scale  as  well  as  space.  Potential  advantages  of  these  techniques  include  1)  the  ability  to 
explicitly  represent  processes  which  truly  are  a  product  of  multi-scale  phenomena,  such  as  ocean 
waves;  2)  the  ability  to  process  data  which  comes  from  sources  of  different  resolution  (scale), 
such  as  synthetic  aperture  radar  and  overhead  infrared  photography;  and  3)  the  ability  to  obtain 
computational  advantages  from  the  allocation  of  processing  power  to  appropriate  scales  and  spatial 
areas,  such  as  for  image  segmentation. 

Model-based  approaches  to  signal  processing  require  a  model  of  the  process  of  interest, 
which  includes  (hypothesized)  causal  connections  among  the  random  elements  of  the  signal. 
Multi-scale  models  range  from  the  simple  (where  the  random  elements  are  coefficients  on  a  set  of 
dyadic,  affine  wavelet  basis  functions)  to  the  complex  (where  successively  finer  scales  of  the 
signal  representation  are  complicated  functions  of  variables  expressed  at  coarser  scales). 

Model-based  techniques  will  only  be  successful  if  the  resulting  algorithms  are  accurate  and 
efficiera.  Accuracy  results  from  an  inherent  compatibility  between  the  strucrire  of  the  model  and 
the  structure  of  the  process  of  interesL  Efficiency  is  largely  the  result  of  regularity  in  the 
computational  structure  of  the  algorithms  which  are  derived  fi’om  the  model.  The  chances  of 
establishing  regularity  in  an  algorithm  are  greatly  enhanced  if  the  model  has  some  degree  of 
regularity  as  well. 

Given  this  context,  the  purpose  of  this  effort  is  to  investigate  whether  multi-scale  models  of 
image  production  could  lead  to  accurate  and  efficient  estimation  algorithms  for  image  processing, 
with  the  long-term  goal  of  embedding  such  algorithms  into  real-time  image  processing  systems  for 
image  enhancement,  object  detection,  or  parameter  estimation.  Accuracy  results  from  an  approach 
which  calculates  the  optimum  estimate  of  each  variable  of  interest,  so  that  performance 
degradations  result  only  from  model  mismatches  and  fundamental  physic^  principles,  not  from 
approximations  built  into  the  solution  algorithm.  Efficiency  results  from  the  regular  structure  of  a 
multi-scale  model,  where  each  refinement  layer  has  the  same  structure  as  any  other,  layers  differ 
only  in  numerical  size  and  in  scale-related  parameters.  To  maintain  accuracy,  the  mathematical 
foundation  adopted  for  this  work  is  Bayesian  probability  theory;  to  maintain  efficiency,  the  multi¬ 
scale  models  have  the  same  structure  as  both  affine  wavelet  decompositions  and  mid-point 
deflection  models  of  synthetic  terrain,  both  of  which  lead  to  exceedingly  fast  computational 
techniques  in  other  applications  [2,  3, 5,  6,  18, 19,  27]. 

MODELING  APPROACH 

This  section  presents  the  rationale  for  investigating  multi-scale  models  of  multi-dimensional 
random  fields  as  the  basis  for  a  new  class  of  estimation  algorithms.  Subsequent  sections  will 
provide  an  overview  of  the  mathematical  elements  of  the  models,  key  issues  that  drive  estimation 
algorithm  design,  and  examples  of  the  processes  that  can  be  treated  in  this  framework. 
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Models 

Mathematical  model  building  has  been  an  essential  element  of  scientific  progress  ever  since 
the  Renaissance.  Building  a  model  of  a  physical  process  that  closely  replicates  observed  behavior 
is  substantial  evidence  that  the  process  is  in  fact  well  understood.  Models  take  many  different 
forms,  and  it  is  difficult  to  generalize  about  them,  but  there  appears  to  be  an  interesting  shift  in 
modeling  perspective  taking  place  at  the  end  of  the  twentieth  century. 

Prior  to  Newton,  models  were  largely  static:  a  collection  of  relationships  and  numerical 
invariants.  Newton  and  Leibnitz  established  a  line  of  modeling  techniques  that  is  even  more  useful 
today  than  in  the  seventeenth  century.  Differential  equations,  and  their  stochastic  counterparts, 
have  led  to  magnificent  successes  in  the  analysis,  estimation,  and  control  of  a  myriad  of  physical 
processes.  Their  applicability  to  image  processing  problems,  however,  has  been  limited  by  two 
factors.  First,  prevalent  digital  computing  techniques  impose  limitations,  based  on  numerical 
effects,  on  the  range  of  scale  which  can  be  employed  in  algorithms  based  on  this  class  of  model  — 
the  choice  of  step  size  in  numerical  iruegration  being  but  a  simple  example  of  the  tradeoffs 
imposed.  Second,  their  extension  to  multi-dimensional  processes,  i.e.  partial  differential  equations 
and  their  stochastic  counterparts,  has  not  yielded  a  class  of  models  whose  sample  functions  appear 
realistic  for  many  applications  contexts. 

Fourier  solidified  an  alternate  view  of  physical  processes  based  not  on  incremental  change 
over  time,  but  on  composition  of  a  large  collection  of  elements  which  extend  across  all  time. 
Representing  a  signal  as  a  weighted  superposition  of  basis  elements  immediately  allows  a  wide 
variety  of  scales  to  be  represented  and,  at  least  for  one-dimensional  signals,  allows  one  to 
synthesize  samples  with  a  high  degree  of  verisimilitude  for  many  applications.  Even  better,  it  leads 
to  extremely  fast  digital  techmques  for  both  synthesis  and  analysis.  Again,  there  are  limitations. 
The  fundamental  reliance  on  linearity  restricts  the  class  of  phenomena  which  can  be  captured  by 
these  models.  Also,  one  cannot  use  these  models  as  synthesis  tools  for  two-dimensional  images 
—  the  set  of  transform  coefficients  which  correspond  to  realistic  images,  at  least  in  the  optical 
spectrum,  cannot  be  easily  charaaerized. 

Common  to  both  perspectives  is  the  fact  that  models  built  on  differential  or  spectral 
structures  have  a  great  deal  of  difficulty  synthesizing  realistic  images  of  common  physical  entities 
such  as  terrain.  If  a  model  cannot  reliMy  synthesize  samples  which  are  representative  of  the  class 
of  images  of  interest,  the  risk  of  model  mismatch  leading  to  ineffective  algorithms  is  undoubtedly 
high.  Nonetheless,  many  of  today’s  best  image  processing  techniques  do  in  fact  rely  on  various 
combinations  of  differential  or  spectral  techniques,  although  not  always  based  on  a  common, 
clearly  specified  model. 

What  alternative  is  there?  The  emerging  field  of  wavelets  [6, 12, 13, 14, 22, 23]  can  be 
viewed  as  yet  another  superpositional  model,  offering  an  enlarged  set  of  basis  functions  to 
represent  non-stationary  effects.  This  view  is  much  too  restrictive,  however.  The  affine  wavelet 
transform,  and  the  multi-scale  models  it  implies  (17,  35],  opens  the  door  to  an  entirely  new  kind  of 
causality  based  on  neither  incremental  change  nor  superposition:  causality  from  scale  to  scale.  To 
fully  exploit  this  class  of  models  without  inheriting  the  weaknesses  of  other  spectral  models 
requires  some  modification,  however.  One  key  modification  is  to  introduce  explicit  coupling 
between  the  wavelet  coefficients  at  successive  scales,  coupling  that  effectively  reduces  the 
enormous  number  of  degrees  of  freedom  presented  by  wavelet  models. 

Tie  resulting  multi-scale  models  lead  to  a  fundamentally  different  perspective  on  a  random 
process.  They  capture  a  notion  of  successive  refinement:  the  fine  feanires  of  the  process  may  be 
dependent  on  the  presence  or  absence  of  coarse  feanires,  but  not  the  reverse.  These  models 
explicitly  represent  some  form  of  scale -to-scale  causality,  and  they  need  not  be  linear.  Multi-scale 
techniques  have  proven  extremely  efficient  in  other  applications,  so  empirical  evidence  suggests 
that  estimation  techniques  inferred  from  a  multi-scale  model  should  also  be  extremely 
computationally  efficient. 


AFIT  Workshop  On  Wavelets  and  Signal  Processing  3 


TP-329 


182  AFIT/AFOSR  Wavelets  Workshop 

ALPHATECH,  Inc. 


Example 

Consider  simple,  one-dimensional  Brownian  motion.  The  Ito  calculus  provides  a  precise 
description  of  this  stocnastic  process  based  on  an  incremental  model.  Weiner  provided  a  model  of 
the  same  process  based  on  the  superposition  of  sinusoids  with  random  phase.  What  is  a  multi¬ 
scale  model  of  the  same  process? 

The  simplest  multi-scale  model  of  Brownian  motion  employs  the  (non-orthonormal) 
wavelet  basis  illustrated  in  Figure  1.  A  series  of  approximations  to  the  sample  path  are  constructed 
by  adding  weighted  translates  of  the  basis  function  at  successively  finer  scales.  The  basis 
functions  are  continuous,  so  each  approximation  must  be  continuous.  If  the  weights  are  dravm 
from  Gaussian  distributions,  then  the  joint  distributions  between  all  pairs  of  points  in  any 
approximation  are  Gaussian  also.  If  the  Gaussian  distributions  for  the  weights  are  shift-in<?ariant, 
and  have  variances  proportional  to  the  length  of  the  suppon  set  of  the  relevant  basis  function,  then 
the  statistics  of  the  approximations  approach  those  of  Brownian  motion.  In  fact,  they  are  identical 
to  Brownian  process  statistics  at  values  of  the  independent  variable  s  that  map  into  the  peaks  of  the 
basis  functions  at  any  scale  n  or  coarser  (since  all  basis  functions  at  finer  scales  are  zero  at  these 
pomts). 


(b)  Successive  lefinemem 
through  superposition 


Figure  1:  Wavelet  Model  of  Brownian  Motion.  The  independent  variable,  s,  lies  in  the  unit  interval,  and  may  be 
interpreted  as  time.  The  process  value  at  s  is  x(s).  (a)  The  “tent”  functions  of  the  basis  seu  the  entire  basis  set 
consists  of  all  dyadically  scaled  and  integrally  translated  versions  of  the  tent,  (b)  Construction  of  a  sample  path  of 
Brownian  motion  by  superposition  of  weighted  basis  functions.  Weights  for  each  basis  function  must  be 
independent  of  one  another  and  drawn  from  zero-mean  Gaussian  distributions  which  are  stationary  in  s.  and  which 
have  vanances  that  decrease  by  a  power  of  2  at  each  successively  finer  scale. 

This  is  a  wavelet  interpretation  of  a  class  of  midpoint  deflection  processes  which  have  been 
used  to  construct  fractal  signals  [15, 24, 31, 33, 37, 38].  At  each  scale,  the  model  constructs  a 
value  for  the  midpoint  of  the  process  over  a  set  of  equal  intervals.  At  the  next  scale,  it  builds 
midpoints  for  each  half-interval  of  the  intervals  at  the  preceding,  coarser  scale.  The  linear  tails  of 
the  tent  function  simply  interpolate  values  that  have  not  yet  been  completely  defined  —  the  value  of 
the  process  at  a  point  that  is  not  a  midpoint  of  an  interval  is  simply  a  linear  combination  of  the 
process  values  at  the  neighboring  points  which  have  been  defined. 

Ignore  the  interpolation,  and  focus  just  on  that  finite  set  of  points  which  have  been 
completely  defined  by  a  finite  number  of  scales.  These  points  are  at  values  of  s  which  are  integral 
multiples  of  a  negative  power  of  2.  The  midpoint  of  an  interval  between  two  neighboring  values  of 
s  at  one  scale  can  be  directly  constructed;  average  the  values  of  x  at  the  endpoints  (interpolate),  and 
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then  add  an  amount  drawn  from  a  Gaussian  distribution.  Repeating  this  process  leads  to  a 
midpoint  construction  process,  where  the  stochastic  process  is  defined  over  an  expanding  set  of 
poults  which,  in  the  limit,  arc  countable  but  dense  in  the  set  of  real  numbers. 

Figure  2  shows  the  data  dependencies  embedded  in  such  a  midpoint  construction  process. 
Assuming  that  x(0)  and  x(l)  are  ^ven  as  boundary  conditions,  a  sample  of  the  process  can  be 
generated  by  successively  subdividing  intervals,  and  synthesizing  a  value  for  the  midpoints  of 
those  intervals.  For  Brownian  motion,  it  suffices  to  make  the  computation  of  each  midpoint 
dependent  only  on  the  process  values  at  its  interval’s  endpoint,  and  on  a  draw  from  an 
independent,  Gaussian  random  variable. 


x(0)  *n) 


Figure  2:  Data  Dependencies  in  a  Midpoint  Construction  Model.  Circles  denote  values  of  the  process  at  successive 
interval  midpoints.  Rectangles  denote  computations  that  construct  the  midpoints  at  the  next  level  of  refinement. 
Arcs  indicate  the  process  values  on  which  these  computations  may  depend.  The  output  of  each  rectangle  may  also 
depend  on  a  random  value  drawn  from  some  specified  distribution.  For  Brownian  noise,  these  computations  consist 
of  linear  averaging  (with  equal  weights)  to  which  a  draw  from  a  zero-mean  Gaussian  distribution  is  added.  The 
variances  of  the  distributions  decrease  by  a  factor  of  2  from  scale  to  scale. 

Figure  3  shows  the  representation,  at  the  8  coarsest  scales,  of  one  sample  path  of  a  linear 
midpoint  construction  process.  Viewed  as  a  synthesis  model,  generation  of  such  a  sample  path  is 
extremely  efficient  due  to  the  sparse  nanire  of  the  dependencies.  In  fact,  the  dependency  diagram 
of  Figure  2  clearly  depicts  the  structural  self -similarity  of  the  process,  as  the  decomposition  of  any 
interval  at  one  scale  proceeds  exactly  as  does  the  decomposition  of  any  other  interval  at  any  other 
scale.  Adding  the  requirement  that  Ae  variances  of  the  Gaussian  distributions  be  proportional  to 
the  length  of  the  interval  being  biseaed  assures  that  the  process  is  statistically  self-similar  as  well 
(and  that  the  auto-correlation  function  of  the  process  matches  that  of  a  Brownian  process,  at  least 
for  the  values  of  s  for  which  samples  have  been  constructed). 

Of  course,  there  is  no  reason  for  the  midpoint  computations  to  be  restricted  to  a  linear- 
Gaussian  structure.  Nor  need  the  values  of  the  process  at  each  value  of  s  be  restricted  to  a  scalar. 
For  synthesis,  evaluation  of  discrete,  nonlinear,  or  vector-valued  functions  is  little  more 
complicated  than  evaluation  of  a  linear  function.  This  is  the  point  at  which  our  work  departs  from 
the  limitations  of  multi-scale  models  supported  by  wavelet  representations. 

Viewed  as  an  analysis  model,  recovery  of  the  representation  of  a  sample  path  at  coarser 
scales  from  fine  scales  is  even  more  simple.  The  representation  at  scale  n  is  simply  decimated,  by 
a  factor  of  2,  to  form  the  representation  at  the  next  coarser  scale. 

Viewed  as  the  basis  for  estimation  techniques,  it  is  unclear  whether  models  of  the  form 
shown  in  Figure  2  offer  computational  advantages  —  although  similar  tree-structured  models  have 
led  to  substantial  speedups  [4,  8,  9, 1 1,  32,  35].  (This  is  precisely  the  question  which  this  work 
answers,  in  the  affirmative).  Estimation  techniques  will  be  simplest  when  the  midpoint 
construction  functions  are  linear-Gaussian,  but  even  these  can  be  used  as  the  basis  for  linearized 
techniques  when  non  linearities  appear.  (The  results  developed  here  deal  with  arbitrary  probability 
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distributions,  but  only  the  linear-Gaussian  case  is  known  to  lead  to  an  exact,  finite-dimensional 
implementation.) 

The  superficial  similarity  of  samples  of  the  two-dimensional  analog  of  this  model  to  terrain 
cross-sections  has  long  been  recognized  [25, 28, 29, 30].  (It  is  not  entirely  appropriate  as  a  terrain 
model,  as  the  f  onstrucnon  process  does  not  prevent  isolated  depressions  which,  in  real  terrain, 
would  be  filled  with  lakes.)  Nonetheless,  this  work  is  based  on  the  supposition  that  multi-scale 
models  generate  much  more  realistic  representations  of  terrain  than  can  be  efficiently  generated  by 
either  differential  or  superpositional  approaches,  particularly  when  the  generating  functions  are 
allowed  to  be  nonlinear.  Moreover,  the  self -similar  structure  of  the  model  which  lies  behind  this 
sample  impart  a  self-similar  structure  to  the  estimation  algorithms  derived  from  it,  leading  to 
extremely  efficient  reconstruction  techniques. 


Scale  1  Scale  5 


Scale  2  Scale  6 


Scale  3  Scale 


Scale  4  Scale 


Figure  3:  RcTincment  of  A  Sample  Path  from  a  Linear  Midpoint  Construction  Fh’ocess.  Midpoints  are  constructed 
as  linear  averages  of  neighboring  endpoints,  to  which  noise  is  added.  This  sample  uses  triangular  distributions  for 
these  increments  in  lieu  of  the  computationally  more  expensive  Gaussian  distribution. 

Thus  the  contribution  of  this  work  is  the  extension  of  optimal  estimation  theory  to  multi¬ 
scale  models  which,  a:  the  ver>’  least,  can  realistically  represent  terrain  and  other  fractal  textures. 

MATHEMATICAL  APPROACH 

To  avoid  unnecessary  dependencies  on  the  specifics  of  a  problem,  the  technical  approach  is 
based  on  models  described  as  acyclic  nets  of  random  variables.  One  example  of  such  a  network  is 
that  of  Figure  2  for  one-dimensional  processes;  another  appears  later  for  square  tessellations  on  a 
two-dimensional  random  field.  Because  the  results  apply  to  any  acyclic  net,  a  foundation  exists  for 
immediate  transfer  to  other  sensor  topologies,  such  as  triangular  or  hexagonal  covers  of  an  image. 
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Markovian  Nets 

The  key  assumptions  that  lie  behind  the  results  described  in  this  work  can  be  illustrated 
with  Figure  4  As  a  synthesis  tool  for  random  processes,  this  network  describes  a  partially 
ordered  set  of  (oseudo)random  computations.  Some  variables  are  given,  or  are  drawn  from 
Siitially  specifi^  distributions.  These  variables  lie  in  the  minimal  places  (with  respea  to  the  partial 
rader  defined  by  the  directed  arcs)  of  the  acyclic  graph. 


Fieure  4;  Pans  of  a  Markovian  Net  Circles  denote  places  which  hold  the  basic  random  variables  of  the  process. 

Rectangles  denote  transUions,  such  as  lm<  which  represent  stochastic  operators  that  transform  vmables  m  mput 
places  to  variables  in  a  unique  output  place.  Directed  arcs  specify  the  inputs  and  output  CXim).  of  each 
transition.  The  bipartite  graph  must  be  acycUc,  so  the  arcs  also  impose  a  partial  order  on  the  places  and  transitions. 

Those  places  which  precede  in  this  partial  order  are  in  its  past,  P(ttn):  those  which  succeed  it  are  m  its  future, 
F(i^:  and  every  other  place  is  in  the  unordered  set  U(lm)- 

All  other  places  contain  random  variables  derived  from  these  initial  vari^les.  Each 
transition  is  characterized  by  a  probabihty  distribution  on  the  random  variables  in  its  output  places, 
conditioned  on  the  values  of  the  variables  in  the  input  places.  In  a  synthesizer  of  a  (pseudo) 
random  process,  this  mechanism  could  be  implemented  either  as  a  random  draw  from  ^ai 
conditional  probability  distribution,  or  as  some  deterministic  operator  applied  to  the  values  of  the 
inputs  and  some  exogenous  random  variable.  In  either  case,  a  condition^  probab^ty  totnbuuon 
on  the  output  variables,  given  the  input  variables,  characterizes  the  statistical  relations  imposed  by 
each  transition. 

The  key  mathematical  assumption  required  by  this  work  is  that  the  output  variables  of  a 
transition  depend  only  on  the  inputs  and  independent,  exogenous  randomness.  Formally,  this 
requires  an  independence  assumption  to  the  effea  that  the  v^ues  of  the  outputs  of  a  transinon  are 
equally  predictable  whether  (1)  only  the  inputs  to  the  transition  are  known,  or  (2)  each  Md  ever)' 
variable  in  the  past  and  unordered  sets  of  places  (see  Figure  4)  are  known.  Clearly  (1)  is  a  (smau) 
subset  of  (2).  This  independence  assumption  is  analogous  to  the  Markovian  property  ^q^ued  of  a 
state-space  description  of  a  conventional  Markov  process;  hence  these  graphs  are  called  Markovian 
nets. 
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Examples 

Figure  5  contains  three  examples  of  random  processes  depicted  as  Markovian  nets.  These 
serve  as  familiar  cases  for  illustration,  and  for  which  estimation  algorithms  are  known  (at  least  for 
the  first  two  examples).  The  algorithms  developed  in  this  work  reduce  to  those  known  algorithms 
for  the  special  cases  of  Figure  5(a)  and  (b). 


*n-l  ^fH-2 

a)  Standatd  dticieie  tune  Maitov  pncets 


b)  DUcreie  index,  quadrant  causal 
Markov  landom  field 


c)  Multiscale  successive  refinement  process 


Figure  5:  Common  Random  Processes  Expressed  as  Markovian  Nets,  a)  Discrete  time  Markov  process,  where  the 
random  variable  in  each  place  is  the  state  of  the  process  at  a  given  time  index,  b)  Discrete  two-dimensional  Markov 
random  field  with  a  generator  that  specifies  the  field  value  in  terms  of  three  neighbors.  The  upper  and  leftmost  field 
samples  are  taken  as  boundary  conditions,  c)  Multi-scale  model  of  a  one-dimensional  process.  This  structure  is 
highly  replicated,  but  is  more  general  than  that  of  Figure  2. 


Figure  6:  A  Simple  Markovian  Net.  Arcs  denote  explicit  statistical  dependences;  many  implicit  dependencies  must 
factor  into  estimation  algorithm  design  as  well.  For  example,  xio  and  xn  share  common  dependencies  on  X2  and 
X3.  Special  cases  of  this  net  useful  for  checking  results  include  those  where  transitions  simply  copy  all  inputs  to 
the  output,  or  where  outputs  are  independent  of  the  inputs  . 

Figure  6  shows  a  smah  Markovian  net  useful  for  illustraring  the  fundamental  issues  to  be 
addressed  by  estimation  algorithms.  It  resembles  a  fragment  of  the  net  in  Figure  5(c),  with  four 
“scales”  of  data.  Of  pamcular  imponance  is  the  fact  that  xio  and  xn  share  statistical  dependencies 
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a:  the  coarsest  scale  (X2).  an  intermediate  scale  (X5),  and  a  fine  scale  (xg).  When  a  measurement  of 
xii  becomes  available,  the  estimate  of  xio  must  be  somehow  revised.  If  estimation  algorithms  are 
to  exploit  the  regularity  of  multi-scale  models,  then  the  algorithm  must  strip  out  some  of  the  fine 
level  det^  as  an  update  to  the  estimate  of  xg,  pass  some  remaining  information  back  to  the 
intermediate  scale  to  update  the  estimation  of  X5,  and  then  to  the  coarsest  scale  to  update  the 
estimation  of  X2.  Moreover,  it  must  then  construct  an  escimaie  for  x  10  which  properly  combines 
the  revised  X2,  X5,  and  xg  estimates.  To  make  matters  more  complex,  it  must  separate  information 
about  X2  from  information  about  X3,  both  of  which  influence  the  measurement  of  xn. 

This  example  will  carry  through  the  following  discussion  of  the  general  results  of  this 
effort;  the  algorithms  describ^  there  will  be  transferred  to  the  much  more  complex  structures 
required  for  multi-scale  models  of  one-  and  two-  dimensional  stochastic  processes  in  a  later 
section. 

GENERAL  RESULTS 

The  objective  of  an  estimation  algorithm  is  to  compute  estimates  of  imperfectly  known 
random  variables  in  response  to  a  measurement  taken  of  one  of  them.  Since  Markovian  nets  use  a 
probabilistic  description  of  uncenainty,  these  estimates  ideally  come  from  properly  updated, 
conditional  probability  distributions.  The  algorithms  which  result  from  this  work  specify  how  to 
compute  those  distributions,  regardless  of  any  specific  structural  assumptions  (e.g.,  self -similarity, 
linearity,  or  Gaussianness).  The  algorithms  are  basically  straightforward  applications  of 
Chapman-Kolmogorov  prediction,  and  Bayesian  update,  equations  to  a  set  of  probability 
distributions. 

The  key  research  issue  addressed  in  this  work  was  not  how  to  update  distributions,  but 
rather  what  is  the  proper  set  of  probability  distributiofts  to  manipulate?  The  unimaginative 
approach  to  estimation  on  Markovian  nets  works  with  the  joint  distribution  on  all  of  the  random 
variables.  This  approach  fails  to  exploit  any  simplifications  made  possible  by  the  topology  of  the 
net  itself  —  and  it  is  weU  known  that  much  simpler  approxhes  are  available  for  some  of  the 
network  topologies  shown  in  Figure  5. 

Two  important  algorithms  resulting  from  this  work  identify  the  proper  set  of  distributions 
to  manipulate.  The  first  algorithm  finds  a  set  of  distributions  sufficient  to  allow  one  to  construct 
estimates  of  all  of  the  random  variables  given  distributions  on  variables  in  the  minimal  (initial) 
places  of  the  graph,  and  the  transition  probability  distributions.  This  set  is  also  adequate  for 
updating  distributions  based  on  a  measurement  of  one  of  the  initial  variables.  The  second 
algorithm  augments  the  first  set  with  some  additional  distributions  required  to  bxkpropagate 
information  from  an  intermediate  or  terminal  place  to  other  random  variables.  This  section  presents 
neither  the  gr^h  theoretic  algorithms  which  determine  the  requisite  set  of  distributions,  nor  the 
estimation  algorithms  themselves;  its  objective  to  convey  the  types  of  manipulations  performed  by 
the  algorithms  and  an  intuitive  justification  for  them. 

Prediction 

Given  a  Markovian  net,  what  are  the  best  estimates  of  the  random  variables  that  can  be 
made  prior  to  a  measurement?  This  is  the  prediction  problem,  and  is  answered  in  the  fullest  by 
providing  a  probability  distribution  for  each  of  the  random  variables  in  the  net.  Such  a  probability 
distribution  is  not  given  as  pan  of  the  net;  it  must  be  derived  from  the  distributions  on  the  variables 
in  the  minimal  places,  and  from  the  transition  probability  distributions. 

The  example  net  in  Figure  6  shows  why  computation  of  these  probability  distributions  is 
not  immediate.  Consider  xn.  It  is  a  function  of  both  xg  and  X9,  and  perhaps  some  exogenous 
noise.  To  compute  the  distribution  on  xn,  one  must  in  general  have  the  joint  distribution  on  xg, 

Xg,  and  the  noise.  If  these  are  all  independent,  one  can  construct  the  joint  distribution  in  a  simple 
manner.  In  this  case,  they  are  not:  xg  and  xg  share  a  common  dependence  on  X2,  so  (except  in 
degenerate  cases)  they  are  not  independent.  Hence  one  must  compute  the  joint  distribution  on  both 
Xg  and  Xg  m  order  to  arrive  at  a  distribution  for  xn. 
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Figure  7  shows  the  set  of  probability  distributions  which  must  be  computed  as  intermediate 
steps  towards  finding  prior  distributions  on  each  of  the  random  variables  in  the  net.  It  shows  a 
seauence  of  computations  of  probability  distributions  which  is  sufficient  to  derive  estimates  of  each 
random  variable  in  the  Markovian  net.  For  example,  the  right  half  of  the  diagram  shows  the 
intermediate  steps  needed  to  compute  a  distribution  on  xn.  The  first  major  product  of  this  effort  is 
a  graph  theoretic  algorithm  to  derive  such  a  net  for  any  given  Markovian  net,  along  with  the 
equations  specifying  each  computation  in  terms  of  basic  manipulations  on  arbitrary  probability 
distributions. 


Figure  7:  Sufficient  Statistics  for  Prediction.  Places  enclose  sets  of  random  variables  found  in  the  underlying 
Markovian  net  of  Figure  6.  These  are  the  sets  on  which  probability  distributions  must  be  computed.  Grey 
transitions  compute  the  output  distribution  as  the  product  of  two  marginal  distributions.  White  transitions 
marginalize  a  joint  distribution  to  obtain  a  distribution  on  one  random  variable.  Placement  of  these  is  somewhat 
arbitrary.  Black  transitions  transform  an  input  distribution  to  an  output  distribution  using  a  transition  distribution 
from  the  underlying  net.  The  labels  on  these  transitions  indicate  the  corresponding  transition  in  Figure  6. 

An  important  corollary  captures  the  special  features  of  linear-Gaussian  nets.  In  these,  the 
output  of  each  transition  (which  may,  of  course,  be  a  vector)  is  restricted  to  be  a  linear  combination 
of  the  inputs,  to  which  an  exogenous  Gaussian  random  variable  is  added.  Also,  the  distributions 
on  the  variables  in  the  initial  places  must  be  Gaussian.  In  this  case,  all  of  the  distributions 
necessary  for  prediction  remain  Gaussian,  so  means  and  covariances  of  these  distributions  suffice. 

The  result  of  this  part  of  the  work  is  a  specific  algorithm  to  identify  that  part  of  a  Markovian 
network's  structure  which  can  be  exploited  to  simplify  the  prediction  problem.  The  degree  of 
simplification  obtained  depends,  of  course,  on  the  topology  of  the  net. 

Update 

Given  a  measurement  of  the  random  variable  in  an  input  place,  the  algorithm  mentioned 
above  can  compute  aU  of  the  posterior  distributions.  One  only  needs  to  replace  the  original 
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distribution  at  that  place  with  the  posterior  distribution  after  a  Bayesian  update,  and  repeat  the 
prediction  process.  If  the  measurement  is  taken  elsewhere,  however,  things  get  more  complicated. 

To  see  that  the  distributions  identified  in  Figure  7  are  insufficient  for  the  update  process, 
consider  a  measurement  of  xn.  Figure  7  suggests  that  it  can  be  used  to  update  the  distribution  on 
Xii,  and  then  the  joint  distribution  on  xg  and  xg,  and  so  on  back  to  the  joint  distribution  on  xj  and 
X3.  From  the  latter,  one  can  construa  an  updated  distribution  on  xi  alone,  which  then  can  be 
propagated  along  the  other  half  of  the  graph  as  if  a  direct  measurement  of  X2  had  resulted  in  the 
update.  The  problem  vrith  this  approach  it  that  the  update  of  the  joint  distribution  on  xg  and  X9 
rwulted  in  an  (implied)  update  to  the  distribution  on  xg,  but  this  does  not  get  faaored  into  the 
computation  of  the  updated  distribution  on  X7  and  xg  in  the  prediction  phase.  Somehow,  the 
update  to  the  distribution  on  xg  must  be  transferred  between  the  two  sides  of  the  graph  using  some 
other  mechanism.  ^ 

That  mechanism  involves  some  additional  statistics  which  augment  those  required  for 
prediction  alone.  Called  crosslink  statistics,  these  preserve  information  on  variables  which  appear 
in  two  or  more  unordered  places  in  the  prediaion  graph.  The  second  major  product  of  this  work  is 
another  graph  theoretic  algorithm  which  determines  where  crosslink  statistics  are  needed,  along 
with  two  algorithms  which  respectively  initialize  and  update  them.  The  combination  of  the 
prediction  starisrics  and  the  crosslink  statistics  are  sufficient  to  update  all  of  the  distributions  on 
individual  random  variables  in  the  Markovian  net. 


Figure  8:  Sufficient  Statistics  for  Updates.  The  two  crosslinlc  places,  L2  and  L5.  correspond  to  the  two  transitions 
in  the  Markovian  net  which  are  referenced  more  than  once  in  the  prediction  net.  The  statistics  associated  with  these 
places  can  take  one  of  several  forms,  but  all  preserve  information  on  the  ranoom  variables  common  to  the  input  and 
output  places  of  the  connected  iransiuons.  In  particular.  L5  statistics  preserve  information  on  xg.  When  a 
measurement  from  a  place  is  obtained,  the  directionality  of  some  arcs  must  be  reversed,  and  added  to  crosslink  arcs, 

to  determine  the  order  of  the  update  computations. 

Figtire  8  shows  the  crosslinks  required  for  the  example  of  Figure  6.  These  statistics  are 
adequate  regardless  of  the  origin  of  a  measurement,  although  the  order  in  which  the  distributions 
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are  updaied  does  depend  on  die  origin.  Note  that  the  crosslink  statistics  transfer  information  not 
only  about  X2  and  xg,  but  also  about  X5.  Referring  back  to  Figure  6.  X5  is  also  in  the  common  past 
of  xio  and  xn,  so  a  measurement  of  xn  should  update  an  estimaie  of  X5,  which  in  turn  should 
affea  the  estimaie  of  xio. 

As  with  the  prediction  problem,  Gaussianness  is  preserved  under  updates  from  linear- 
Gaussian  measurements.  In  this  case,  the  updated  distributions  remain  Gaussian,  and  are 
completely  determined  from  their  means  and  covariances. 

SPECIFIC  RESULTS 

With  a  general  theory  of  estimation  on  Markovian  nets  available,  estimation  for  multi-scale 
models  becomes  a  special  case.  When  applied  to  Markovian  net  descriptions  of  one-  and  two- 
dimensional  random  fields  which  have  a  high  degree  of  structural  regularity,  it  is  not  surprising 
that  the  resulting  estimation  algorithms  are  also  highly  regular. 


One-Dimensional  Processes 

Figure  2  showed  a  Markovian  network  representation  for  a  multi-scale  model  of  Brownian 
motion.  It  also  suggested  a  much  richer  class  of  stochastic  processes,  as  the  transition 
distributions  need  not  be  based  on  linear-Gaussian  transformations.  Since  the  general  results  in 
estimation  on  Markovian  nets  described  above  do  not  rely  upon  linearity  or  Gaussianness, 
sufficient  statistics  for  estimation  on  arbitrary  multi-scale  models  of  one -dimensional  processes 
may  be  identified. 


Figure  9:  Suffidem  Statistics  For  Prediction  In  a  One-Dimensional  Multi-Scale  Model.  Place  labels  indicate  values 
of  the  process  at  specific  points  in  time.  Transition  labels  indicate  the  lime  to  which  a  midpoint  generated  by  thar 
transition  corresponds.  Note  that  each  sample  in  the  interior  of  the  unit  interval  appears  in  two  places. 

Figure  9  shows  the  network  required  for  prediction  in  one-dimensional  multi-scale 
processes.  The  sufficient  statistics  are  simply  joint  distributions  on  process  values  at  interval 
endpoints.  They  are  connected  in  a  tree  structure,  showing  that  the  regularity  of  the  lattice  in 
Figure  2  imparts  a  regularity  to  the  prediction  process.  Note  that  this  diagram  does  not  represent  a 
Markovian  net;  the  operation  of  transitions  with  identical  labels  is  not  independent,  as  each 
generates  the  value  of  the  process  at  the  midpoint  of  the  interval  associated  with  its  input. 
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Figure  10  shows  the  crosslinks  necessary  for  updates.  The  crosslinks  connea  places 
representing  neighboring  intervals.  They  account  for  the  fact  that  a  measurement  of  a  process 
value  is  statistic^y  related  to  the  process  values  at  the  endpoints  of  that  interval  —  and  of  all  other 
intervals  in  which  it  is  nested.  Therefore,  the  update  process  propagates  outward  from  the 
measurement  point  (through  coarser  scales)  as  estimaies  of  endpoints  over  the  enclosing  intervals 
are  revised,  and  then  inwards  (through  finer  scales)  as  estimates  of  values  inside  other  intervals  are 
updated  to  reflect  revisions  in  the  endpoint  estimaies. 


Figure  10;  Sufficient  Statistics  for  Update  In  a  One-Dunensionai  Multi-Scale  Model.  Crosslinks  connect  places 
containing  duplicate  process  values.  Crosslink  statistics  may  be  merged  with  the  prediction  statistics  at  the 
immediately  preceding  places  to  reduce  this  structure  to  a  tree.  The  complexity  of  the  update  statistics  are 
independent  of  the  number  of  scales  included  in  the  model. 

The  most  important  property  of  these  multi-scale  models  is  that  the  complexity  of  the 
sufficient  statistics  does  not  depend  on  the  number  of  scales  included  in  the  model.  This  limits  the 
computational  effort  required  to  perform  an  update,  and  the  size  of  the  data  structures  required  to 
suppon  that  process.  In  fact,  for  processes  that  suit  the  representation  of  Figure  2,  the  complexity 
of  the  update  process,  for  a  single  measurement,  is  linear  in  the  total  number  of  sample  points. 

What  this  work  does  not  address  is  the  question  of  parameterization  of  the  distributions  on 
the  variables  in  the  places  in  Figure  10.  For  linear-Gaussian  models  of  the  type  illustrated  in 
Figure  4,  all  required  distributions  are  Gaussian.  For  other  processes,  finite  parameterization  of 
the  conditional  distributions  at  each  place  may  not  be  possible. 

Nonetheless,  other  forms  of  the  transition  probability  distributions  are  important.  For 
example,  Figure  1 1  shows  a  highly  nonlinear  transition  mechanism  to  construct  values  of  the 
process  at  successive  midpoints.  This  divides  the  unit  interval  into  segments  of  constant  process 
values.  Segment  boundaries  can  be  determined  through  scale-to-scale  causality  by  repeatedly 
assigning  the  midpoint  of  an  interval  either  to  one  of  the  segments  in  which  an  endpoint  is 
contained,  or  to  a  new  segment.  By  restricting  new  segments  to  appear  only  when  an  interval’s 
endpoints  are  in  different  segments,  segments  remain  connected.  The  process  value  over  each 
segment  is  determined  by  the  value  assigned  to  the  first  point  that  falls  into  it.  Figure  12  shows  a 
typical  sample  of  one  of  these  midpoint  selection  processes. 
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a)  Equal  endpoints  b)  Unequal  endpoints  permit 

force  either  midpoint  selection  or 

equal  midpoint  new  midpoint  generation 


Figure  11:  Example  of  Nonlinear  Midpoint  Construction.  For  each  interval,  the  value  of  'he  process  is  determined 
as  a  function  of  the  values  at  the  endpoints  of  the  interval,  (a)  If  the  endpoint  values  are  equal,  the  midpoint  value  is 
set  equal  to  them,  (b)  If  the  endpoint  values  differ,  then  with  some  specified  probability  the  midpoint  may  take  on  a 
completely  new  value  drawn  from  some  distribution  (e.g.,  uniform  over  the  range  of  the  process).  If  it  does  nou 
then  one  of  the  endpoints  is  selected,  and  its  value  copied  to  the  midpoint. 


Figure  12;  Sample  Function  of  a  Nonlinear  Midpoint  Construction  Process.  The  eight  coarsest  scales  of  the 
evolution  of  a  sample  function  from  a  midpoint  selection  process  show  how  new  segments  arc  created  when  their 
first  point  is  constructed,  and  extended  by  the  assignment  of  midpoint  values  to  one  of  the  endpoint  values.  The 
endpoint  values  at  scale  1  arc  set  to  0.5  (left)  and  0.0  (right)  at  the  start  of  the  construction. 
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The  sufficient  statistics  for  a  midpoint  selection  process  remain  the  same  as  for  the  midpoint 
deflection,  or  Brownian  motion,  processes:  the  joint  distributions  on  process  values  at  the 
endpoints  of  a  nested  set  of  intervals.  Unlike  the  linear-Gaussian  case,  there  is  no  obvious  finite 
parameterization  of  these  distributions.  If,  however,  the  number  of  levels  which  can  be  assigned 
to  segments  is  small,  then  it  is  possible  to  store  this  joint  distribution  explicitly. 

With  the  requisite  distributions  on  hand,  estimation  algorithms  for  one-dimensional  multi¬ 
scale  processes  become  straightforward  specializations  of  the  general  algorithms.  For  the 
processes  disnis*^^  here,  the  algorithms  are  in  faa  equivalent  to  known  algorithms  for  estimation 
on  tree  structures. 

Two-Dimensional  Processes 


Image  processing  requires  treatment  of  two-dimensional  random  fields,  not  just  one¬ 
dimensional  processes.  Two-dimensional  fields  also  admit  a  multi-scale  Markovian  net 
representation,  although  the  topology  of  the  Markovian  nets  becomes  somewhat  more  complex. 


Figure  13:  Markovian  Net  Represeniaiion  of  a  Quadrant  Decomposition  for  a  Two  Dimensional  Random  Field,  a) 
Each  quadrant  inherits  one  comer  value  from  a  coarser  scale,  and  three  from  the  current  scale.  These  arc  inputs  into 
operators  which  determine  process  values  at  boundary  midpoints  and  the  center  of  the  quadrant,  b)  Nesting  quadrants 
permits  rcfmement  through  an  arbitrary  number  of  scales.  Note  that  each  value  of  the  process  ultimately  utflucnccs 

the  construction  of  eight  other  process  values. 

Figure  1 3  illustraies  one  way  to  represent  a  two-dimensional  field  with  a  multi- scale 
Markovian  net.  The  unit  square  is  the  domain  of  interest,  and  it  can  be  divided  into  four  equal 
quadrants.  These  quadrants  may  be  recursively  decomposed  in  a  similar  manner  to  ^ve  a  quadtree 
structure  [7].  However,  the  random  variables  of  the  Markovian  net  cannot  be  associated  with  just 
these  nested  quadrants,  or  the  Markovian  scale-to-scale  independence  assumption  would  preclude 
exact  matches  at  boundaries  between  quadrants.  Instead,  each  quadrant  can  be  characterized  by  the 
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values  of  the  random  field  at  its  four  comers,  and  one-dimensional  multi-scale  processes  insened 
to  describe  the  boundaries  between  quadrants. 

As  a  synthesizer  of  samples  of  random  fields,  the  model  of  Figure  13  operates  as  follows. 
Assume  process  values  have  already  been  determined  at  the  four  comers  of  a  quadrant.  Generate  a 
value  for  the  midpoint  of  each  edge  of  the  quadrant  using  the  one -dimensional  midpoint 
construction  techniques  presented  earlier.  Note  that  these  midpoint  values  will  be  available  to  finer 
scale  operations  affecting  both  quadrants  on  either  side  of  the  boundary.  Finally,  generate  a  value 
for  the  process  at  the  center  point  of  the  quadrant,  and  notice  that  ail  information  necessary  to 
enable  the  construction  of  the  process  in  each  sub-quadrant  is  now  present. 

Figure  14  shows  the  complete  network  for  a  three-scale  model  of  a  two-dimensional 
random  field.  It  consists  of  21  copies  of  the  generating  structure  from  Figure  13(a),  nested  in  a 
quadtree  structure.  Unlike  conventional  quadtree  representations,  however,  it  includes  embedded 
one-dimensional  processes  along  quadrant  boundaries.  The  samples  generated  by  these  processes 
affect  the  generation  of  samples  in  both  quadrants  at  finer  levels.  This  way,  the  sample  fields 
described  by  the  model  may  remain  continuous,  and  avoid  the  boundary  artifacts  of  tree  models. 


Figure  14:  Top  Three  Scales  of  a  Markovian  Net  Rcpreseniaiion  of  a  Two-Dimensional  Field.  The  topology  of  the 
net  is  quite  sparse  and  highly  regular,  although  it  cannot  be  reduced  to  an  equivalent  tree  model.  Note  that  each 
boundary  of  this  net  is  just  a  one-dimensional  multi-scale  model,  and  that  other  one-dimensional  models  appear 

along  internal  boundaries. 

This  representation  of  a  two-dimensional  random  field  is  clearly  not  a  tree.  Nor  can  it  be 
reduced  to  a  tree  by  some  manipulation  such  as  augmentation  of  the  set  of  variables  stored  at  each 
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place  wirhout  making  the  number  of  variables  stored  at  each  place  proportional  to  the  depth  of  the 
model  Since  the  complexity  of  the  statistics  required  by  the  optimk  estimation  algorithms  is  at 
least  proportional  to  the  complexity  of  the  set  of  variables  in  the  places,  such  a  structural  reduction 
would  have  unacceptable  impacts  on  algorithm  compiexi'v.  This  is  th^  model  struciure  which 
motivated  the  work  on  a  general  approach  to  estimation  on  Markovian  nets  described  earlier. 

The  graph  theoretic  algorithms  which  identify  the  structure  of  an  estimaior  for  a  Markovian 
net  can  be  applied  directly  to  diis  network.  Figure  15  illustrates  one  element  of  the  structure 
required  for  prediction  —  an  element  corresponding  to  one  quadrant.  Several  copies  of  this 
structure  must  be  combined  into  a  tree  in  order  to  prescribe  the  prediction  structure  for  an  entire 
model 


Figure  15;  Prediction  Stansiics  for  One  Quadrant's  Decomposiuon.  Each  5x5  array  represents  25  samples  of  the 
random  field  at  the  finest  scale  shown  for  the  generator  of  Figure  13(a).  Darkened  cells  indicate  which  sample  values 
are  mcluded  in  the  set  at  each  place.  Transition  labels  refer  to  the  transitions  in  the  standard  generaior  of  Figure 

13(a). 

To  understand  the  structure  of  these  prediction  statistics,  first  examine  the  four  boundaries 
of  the  quadrant.  Each  of  these  is  a  one-dimensional  process.  The  prediction  statisrics  for  one- 
dimensional  processes  appeared  in  Figure  9,  and  turned  out  to  be  distributions  on  process  values  at 
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the  two  endpoints  of  an  interval.  This  is  precisely  what  Figure  15  shows  for  the  exterior 
boundaries:  each  place  holds  process  values  for  the  endpoints  of  a  segment  at  the  coarser  level  (at 
the  top  of  each  boundary  tree  fragment)  or  at  the  finer  level  (at  the  bottom  of  each  fragment) 
corresponding  to  the  two  half-intervals. 


Figure  16:  Update  Structure  for  Two-Dimensional  Processes.  Crosslinks  appear  as  cur  ed  arcs,  without  places.  In 
this  net,  crosslinks  connect  more  than  two  transitions  due  to  the  complexity  of  tin,  connections  in  the  basic 
Markovian  net  model.  Connected  transitions  correspond  to  the  same  transition  in  Figure  13(a),  and  all  introduce  the 

same  sample  value  into  their  output  places. 

The  17  places  in  the  interior  of  the  diagram  also  form  a  tree.  The  root  of  the  tree  contains 
sample  values  at  the  four  comers  of  the  quadrant.  Four  of  the  leaves  contain  the  sample  values  at 
the  four  comers  of  the  subquadrants,  allowing  these  structures  to  be  concatenated  into  a  full  tree. 
The  remaining  four  leaves  are  the  roots  of  another  tree  for  the  one-dimensional  processes  that  form 
the  boundaries  between  the  subquadrants.  The  remaining  8  places  are  intermediate  steps  needed 
since  each  transition  only  introduces  one  new  variable  into  the  statistic  set  at  a  time. 

To  complete  the  treatment  of  two-dimensional  fields,  one  must  add  the  crosslinks  which 
convey  update  information  between  the  branches  of  the  prediction  tree.  These  appear  in  Figure  16. 
As  with  the  jne-dimensional  case,  the  crosslink  arcs  have  no  directionality  here;  that  is  imposed 
by  the  source  of  a  measurement.  The  purpose  of  the  crosslinks  is  the  same  as  discussed  above: 
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when  information  from  a  measurement  is  propagated  up  the  prediction  tree,  the  pan  of  it  that 
pertains  to  the  variables  introduced  at  this  scale  must  be  preserved  for  the  time  when  update 
information  propagates  back  down  the  other  branches  of  the  prediction  tree. 

The  most  important  feature  of  this  update  structure  is  that,  once  again,  the  complexity  of  the 
statistics  required  for  prediction  and  update  are  independent  of  the  number  of  scales  included  in  the 
model.  Thus  the  computational  load  required  to  update  all  variables  in  the  model  from  a 
measurement  taken  at  one  poira  is  proportional  to  the  number  of  nodes  in  the  prediction  tree,  which 
in  turn  is  proportional  to  the  number  of  pixels  in  the  two-dimensional  image. 

Recall  that  the  objective  of  this  woik  was  to  develop  accurate  and  efficient  estimation 
aleorithms  based  on  multi-scale  models  of  two-dimensional  random  processes.  This  objective  has 
been  met:  the  algorithms  introduced  here  are  the  most  accurate  possible,  as  they  explicitly 
reconstruct  the  conditional  probability  distribution  on  each  variable  in  the  process.  As  can  now  be 
seen,  they  are  also  extremely  efficient,  as  the  computational  effort  required  to  process  an  update  is 
simply  proportional  to  the  size  of  the  process. 

APPLICATIONS 

Work  is  not  complete,  however.  Establishing  the  general  structure  of  estimation  algorithms 
for  multi-scale  models  of  two-dimensional  random  processes  is  an  important,  but  not  complete, 
step  towards  the  ultimate  goal  of  real-time  image  reconstmction,  fusion,  and  identification.  To  see 
where  additional  work  is  necessary,  consider  some  potential  applications. 

Problem  Requirements 

There  are  three  broad  classes  of  application  for  this  work:  static  image  analysis,  dynamic 
image  analysis,  and  image  fusion.  All  but  the  first  impose  real-time  computing  constraints,  and 
hence  a  close  match  between  algorithm  structure  and  available  computing  structures.  Most 
importantly,  however,  one  can  now  consider  novel  approaches  to  classical  problems  which  can  be 
posed  as  estimarion  problems  within  the  structure  investigated  here,  and  rest  assured  that  the 
solutions  will  inherit  the  computational  regularity  and  simplicity  of  the  general  algorithms 
developed  to  date. 

First,  consider  static  image  analysis.  Important  problems  in  this  area  include  segmentation, 
texture  identification  and  classification,  and  anomaly  detection.  All  three  problems  can  be  posed  in 
terms  of  multi-scale  Markovian  network  models.  Midpoint  constraction  models  provide  a 
statistical  characterization  of  a  wide  variety  of  irregular  boundaries,  such  as  shorelines.  Different 
coefficients  in  linear-Gaussian  transformations  can  lead  to  a  wide  range  of  spatially-invariant 
textures,  such  as  pasture,  woodland,  or  ocean  surfaces.  Combinations  of  the  two  provide  models 
of  landscapes  whose  sample  fields  are  strikingly  realistic.  An  estimator  build  for  these  models 
could  not  only  reconstruct  estimates  of  the  field  values  at  all  sample  points,  but  also  estimate  model 
parameters  and  segment  boundaries.  Such  algorithms  will  require  computationally  realizable 
approximations  to  the  exact  algorithms  derived  here,  analogous  to  the  generalized  likelihood  ratio 
techniques  for  failure  detection  (as  a  solution  to  the  segmentation  problem),  the  extended  Kalman 
filter  (for  model  parameter  identification  as  pan  of  texture  classification),  and  model  mismatch 
evaluation  (for  anomaly  detection). 

One  can  view  dynamic  images  as  a  three-  or  four-dimensional  random  field,  with  the  time 
dimension  having  the  property  that  information  need  only  propagate  in  one  direction  along  it.  Too 
complex  to  illustrate,  but  easily  represented  in  a  digital  computer,  are  the  multi-scale  Markovian  net 
models  for  these  higher  dimensional  processes.  Temporal  causality  may  allow  substantial 
simplifications,  just  as  it  does  in  conventional  Markov  processes.  For  example,  one  can  use  the 
model  of  Figure  14  to  describe  the  initial  frame  of  an  image  sequence,  and  also  (with  different 
parameters)  to  describe  the  changes  from  frame  to  frame.  Tracking  moving  boundaries  and 
detecting  changes  in  textures  become  two-dimensional  analogs  of  tracking  and  detection  algorithms 
for  convention^  time  series. 
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Finally,  image  fusion  algorithms  must  process  image  data  a:  different  scales,  with  different 
centerpoinis  and  orientations,  often  involving  different  physical  phenomenologies,  into  a 
composite  representation  of  an  area  of  interest.  Representing  different  sensor  scales  is  natural  in 
this  modeling  environment;  unfortunately,  most  phenomenological  models  have  diffenntial  or 
spectral  foundations  instead  of  an  explicit  multi-scale  structure.  Therefore,  a  model  identification 
aleorithm.  again  based  on  linearization  techniques  similar  to  the  extended  Kalman  filter,  would  be 
effective  in  estimating  model  parameters  directly  from  imagery. 

CONCLUSIONS 

This  work  set  out  to  investigate  whether  multi-scale  models  of  random  fields  could  lead  to 
accurate  and  efficient  estimation  algorithms.  It  accomplished  that  objecnve,  with  affiimarive 
answers.  The  estimation  algorithms  compute  the  exact  posterior  distributions  on  process  variables 
given  a  measurement  taken  from  any  location,  at  any  sc^e.  They  operate  in  time  that  is 
proportional  to  the  size  of  the  field  of  interesq  this  efficiency  results  from  the  essential  fact  that  the 
sufficient  statistics  for  estimation  in  a  multi-scale  model  are  of  complexity  that  is  independent  of  the 
number  of  scales  employed. 

With  respect  to  practical  implementations  of  the  algorithms  developed  here,  questions 
oumumber  answers.  The  linear-Gaussian  equations  provide  a  finite-dimensional  realization  of  an 
optimal  esrimaior  for  one  special  case,  but  others  may  exist.  Nothing  is  yet  known  about  potential 
simplifications  when  one  is  presented  with  a  set  of  measurements  taken  over  a  region  of  the  image, 
instead  of  at  a  single  point.  Nor  have  linearized  versions  of  the  algorithms  for  use  with  mildly 
nonlinear  problems  been  constructed.  Nor,  indeed,  has  duality  been  exploited  to  derive  multi-scale 
optimization  algorithms  based  on  the  same  model  smicture. 

Therefore,  we  conclude  that  the  domain  of  multi-scale  stochastic  process  models  is 
exceedingly  rich.  We  also  conclude  that  they  lead  to  highly  structured  and  efficient  algorithms 
which  solve  a  wide  class  of  estimation  problems  associated  with  that  model.  These  algorithms 
offer  the  potential  of  vast  simplifications  for  a  variety  of  image  analysis  problems,  as  well  as  a 
basis  for  extensions  into  even  more  important  applications  such  as  real-time,  video  tracking. 
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Abstract.  A  maximally  decimated  filler  bank  system  (with  possibly  unequal  decimation  ratios  in 
the  subbands)  can  be  regarded  as  a  generalization  of  the  short  time  Fourier  transformer.  In  fact, 
it  is  known  that  such  a  ‘filler  bank  transformer’  is  closely  related  to  the  wavelet  transformation. 
A  natural  question  that  arises  when  we  conceptually  pass  from  the  traditional  Fourier  transformer 
to  the  filter  bank  transformer  is;  what  happens  to  the  convolution  theorem,  i.e.,  is  there  an  analog 
of  the  convolution  theorem  in  the  world  of  ‘filter  bank  transforms’?  In  this  paper  we  address  the 
question  first  for  uniform  decimation  and  then  generalize  it  to  the  nonuniform  case.  The  result 
takes  a  particularly  simple  and  useful  form  for  paraunilary  (or  orthonormaJ)  filter  banks.  It  shows 
how  we  can  convolve  two  signals  x{n)  and  g{n)  by  directly  convolving  the  subband  signals  of  a 
paraunitary  filter  bank  and  adding  the  results.  The  advantage  of  the  method  is  that  we  can  quantize 
in  the  subbands  based  on  the  signal  variance  and  other  perceptual  considerations,  as  in  traditional 
subband  coding.  As  a  result,  for  a  fixed  bit  rate,  the  result  of  convolution  is  much  more  accurate 
than  direct  convolution.  That  is,  we  obtain  a  coding  gain  over  direct  convolution.  We  will  derive 
expressions  for  optimal  bit  allocation  and  optimal  coding  gain  for  such  paraunitary  convolvers.  As 
a  special  case,  if  we  take  one  of  the  two  signals  to  be  the  delta  function  (e.g.,  g(n)  =  6[n)),  we  can 
recover  the  well-known  bit  allocation  and  coding  gain  formulas  of  traditional  subband  coding. 
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1.  INTRODUCTION 

Fig.  1.1(a)  shows  the  A/  channel  maximally  decimated  digital  filter  bank,  which  has  been 
studied  by  a  number  of  authors  in  the  past  decade.  Here  Hk(:}.  Fk{z),  0  <  /:  <  A/  -  1  are  the 
set  of  analysis  and  synthesis  filters.  The  notations  J  Uk  and  t  ^k  denote  the  n/t-fold  decimalor 
and  interpolator  (unpsampler)  as  defined  in  several  earher  references  [l]-[5].  The  boxes  labeled  Qk 
denote  quantizers  which  quantize  the  subband  signals  ifc(n). 

The  relations  between  filter  banks  and  wavelet  transforms  have  been  known  for  some  time  [6]- 
[12].  An  excellent  magazine  article  appeared  recently  [10],  revealing  this  connection  explicitly.  It  is 
well  known  that  wavelet  transforms  provide  more  flexibility  (in  terms  of  time-frequency  resolution) 
than  the  traditional  Fourier  transform.  In  tJiis  paper  we  deal  only  with  discrete-time  filter  banks 
(both  uniform  and  nonuniform  decimators  will  be  considered).  It  is  known  that  discrete  time  filter 
banks  can  be  considered  as  discrete  time  wavelet  transformations.  Here  the  analysis  bank  can 
be  viewed  as  a  transformation  from  ‘time’  to  ‘time-frequency’.  We  will  simply  refer  to  this  as  the 
filter  bank  transform,  and  the  decimated  subband  signals  Xk{n)  will  be  called  the  transform-domain 
signals.  The  synthesis  bank  is  regarded  as  the  inverse  transformer  (assuming  perfect  reconstruction, 
that  is,  i(n)  =  i(n)). 

Aim  of  the  paper. 

The  advent  of  these  transforms  leads  us  to  ask  the  question,  “how  do  the  standard  properties 
of  the  Fourier  transformation  generalize  to  the  case  of  ‘filter-bank  transforms’?’’  For  example, 
what  is  the  extension  of  the  convolution  theorem?  To  introduce  the  main  topic  of  the  paper, 
let  y{n)  denote  the  convolution  of  two  sequences  i(n)  and  g{n),  i.e.,  j/(n)  =  X^„i(m)5(Ti  -  m). 
According  to  Fourier  transform  theory,  the  transform  of  y{n)  is  related  to  those  of  x{n)  and  g{n) 
as  =  A'(e-’“')G(eJ‘*'),  i.e.,  convolution  becomes  ‘multiplication’  in  the  transform  domain. 

Now  consider  the  ‘filter  bank  transformer’,  with  the  decimated  subband  signals  regarded  ais  the 
‘transform  domain’.  What  is  the  ‘convolution  theorem’  in  tliis  case?  To  expand  on  this  question, 
consider  Fig.  1.1  where  we  show  i(n)  and  g(n)  as  the  inputs  to  two  copies  of  the  filter  bank. 
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The  transform  domain  ‘coefficients’  corresponding  to  x(n)  and  g(n)  are  the  sequences  zje(”)  and 
Pfc(n),  respectively.  How  should  we  combine  !*(«), 0  <  k  <  M  ~  I  so  that  the  convolution 
-  m)  can  be  obtained  from  these,  assuming  there  are  no  subband  quantizers? 

In  Sec.  2.1  we  will  derive  this  convolution  theorem  for  the  case  of  uniform  filter  banks  (i.e., 
rik  =  M  for  all  k).  The  result  takes  an  exceptionally  simple  form  in  the  case  of  pamunitary  filter 
banks  [2],  (12]-[14].  More  specifically,  the  convolution  x{n)  »  g{n)  is  reduced  to  computing  the 
convolutions  Xk{n)  *  <;*{«)  and  adding.  This  will  be  staled  more  precisely  in  Theorem  2.1  (equal 
Uk)  and  Theorem  2.2  (unequal  rik).  In  Sec.  2.2,  the  result  will  be  extended  to  the  case  of  filter 
banks  with  nonuniform  decimation  ratios.  Once  again,  it  will  be  shown  that  when  the  synthesis 
filter  coefficients  form  an  orthonormaJ  basis  (this  being  the  extension  of  the  paraunitary  concept), 
the  ‘convolution  theorem’  takes  a  special  simple  form.  Even  though  the  uniform  filter  bank  is  a 
special  case,  we  have  chosen  to  treat  it  separately  first,  because  it  is  much  simpler,  while  conveying 
most  of  the  ideas  well. 

Usefulness. 

The  motivation  for  obtaining  these  ‘convolution  theorems’  does  not  originate  from  a  desire  to 
obtain  algorithms  that  are  faster  than  the  many  well-known  fast  convolution  technique.  (Indeed, 
the  ‘state  of  art’  for  fast  convolutions  is  already  very  advanced).  The  actual  motivation  comes  from 
the  fact  that  we  can  quantize  in  the  subbands,  and  reduce  the  roundoff  error  (for  fixed  wordlength) 
by  proper  bit  allocation  schemes.  Thus,  instead  of  quantizing  i(n)  and  then  convolving  with  g{n), 
we  can  now  quantize  i*(n)  and  then  convolve  with  Pfc(n)  and  add  the  results  for  all  k.  When 
performing  this  quantization  in  subbands,  we  can  exploit  the  subband  energy  distribution  and 
perform  optimal  bit  allocation.  In  this  way,  we  obtain  increased  accuracy  for  a  given  bit  rate.  That 
is.  the  system  offers  a  coding  gain.  This  idea  is  very  similar  in  philosophy  to  subband  coding  [15] 
(e.g,,  see  Chapter  11  of  [16].  and  Chapter  1  of  [17]).  In  a  spirit  similar  to  that  described  in  the  above 
references,  we  can  define  a  coding  gain  for  the  paraunitary  convolver.  In  Sec.  3  we  will  present  a 
detailed  study  of  this  coding  gain.  We  w’ill  obtain  the  optimal  bit  allocation  formula,  and  study  the 
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coding  gain  under  optimal  bit  allocation.  Unlike  in  usual  subband  coding,  the  paraunitary  subband 
convolver  can  provide  a  coding  gain  >  1  even  if  x(n)  has  a  flat  spectrum  (i.e.,  is  white). 

It  is  important  to  notice  that  the  computation  of  the  subband  signals  Xfc(n)  itself  involves 
filtering.  If  this  filtering  complexity  is  comparable  to  the  direct  convolution  of  i(n)  and  g(n),  then 
the  above  technique  is  clearly  unworthy.  It  has  potential  applications  only  when  x{n)  and  g{n)  are 
very  long  sequences  (in  comparison  with  the  lengths  of  the  analysis  filters  Bic(z)).  An  useful  special 
cases  arises  when  the  analysis  filters  have  length  <  M  (which  is  analogous  to  transform  coding). 
We  wiU  see  that,  even  in  this  case,  substantial  coding  gain  can  be  exhibited. 

It  is  meaningful  to  try  to  maximize  the  coding  gain  by  optimization  of  the  coefficients  of  the 
paraunitary  filter  bank  (for  a  fixed  order).  Such  an  optimization  is  easier  to  formulate  in  the  special 
case  where  the  filter  bank  reduces  to  the  transform  coding  system  (to  be  described  in  Fig.  4.1  later, 
where  T  is  a  unitary  matrix).  This  is  a  specical  case  of  paraunitary  filter  banks  (with  constant 
polyphase  matrices).  We  consider  it  separately,  and  address  the  problem  of  optimal  choice  of  T 
(under  the  optimal  bit  allocation  constraint).  This,  then  is  the  generalization  of  the  Karhunen- 
Leove  transform  (KLT)  [16],[18],[19],  for  the  case  of  unitary  convolvers.  We  will  formulate  the 
problem  in  terms  of  two  autocorrelation  matrices,  but  unlike  in  the  KLT  problem,  a  closed  form 
solution  for  T  (e.g.,  in  terms  of  the  eigenvectors)  is  not  possible.  We  will  consider  a  numerical 
example  based  on  speech  signals,  and  show  that  the  coding  gain  of  the  convolver  is  very  close  to 
the  theoretical  upper  bound,  if  the  matrix  T  is  taken  to  be  the  DCT  matrix.  This  observation 
parallels  a  similar  well-known  result  in  orthogoral  transform  coding  of  speech  [16]. 

Outline.  In  Sec.  2.1  we  derive  the  convolution  theorem  for  paraunitary  filter  banks  with 
uniform  decimation.  This  is  extended  to  the  case  of  nonuniform  filter  banks  (n*  not  identical 
for  all  k)  in  Sec.  2.2.  In  this  case  the  paraunitary  property  is  replaced  with  a  generalization 
(orthonormality).  Section  3  presents  a  derivation  of  optimal  subband  bit  allocation,  as  well  as  the 
corresponding  coding  gain  expression  for  the  paraunitary  convolver.  Once  again,  the  uniform  case 
will  be  considered  first,  and  then  generalized  to  the  nonuniform  case.  Even  though  the  former  is 
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Strictly  a  special  case  of  the  latter,  we  have  chosen  to  treat  them  separately.  This  is  because  of  the 
simplicity  of  the  uniform  case,  which  at  the  same  time  brings  out  many  of  the  important  features. 
In  Sec.  3.4  we  show  how  the  well-known  coding  gain  results  for  traditional  subband  systems  can 
be  obtained  as  special  cases  of  the  convolver  coding  gain  expressions.  Section  4  consider  a  further 
specicalization  of  the  uniform  paraunitary  convolver,  with  analysis  filter  lengths  constrained  to  be 
<  M.  This  is,  in  principle,  the  extension  of  transform  coding  problem,  to  the  case  of  convolution. 
It  has  the  advantage  that  we  can  further  maximize  the  coding  gain  by  optimizing  the  transform 
matrix  (generalization  of  the  KLT).  Section  5  presents  several  numerical  examples,  and  provides  a 
relative  comparision  of  the  coding  gain,  for  different  lest  conditions. 


Notations  and  basics. 


1.  Bold  faced  quantities  represent  matrices  and  vectors.  The  notations  A^,  A'  and  A^  represent, 
respectively,  the  transpose,  conjugate,  and  transpose-conjugate  of  A.  The  accent  ‘tilde’  as  in 
H(z)  stands  for  transposition,  followed  by  conjugation  of  coefficients,  followed  by  replacement 
of  z  with  .  On  the  unit  circle  H(2)  =  H^(2). 

2.  The  M-fold  decimator  i  M  and  interpolator  T  ^  (or  expander)  are  defined  as  in  (l],[2].  Thus 
the  input  output  relation  for  the  decimator  is  y{n)  =  x{Mn),  and  for  the  interpolator  it  is 


,  ,  _  J  x{nlM),  n  =  integer  mul.  of  M 
^  ^  (0,  otherwise. 


In  this  paper,  all  decimation  and  interpolation  ratios  are  positive  integers.  In  equations,  the 
notation  a(n)|^j^  denotes  the  decimated  sequence  a{Mn).  (The  vertical  bar  is  omitted  where 
it  is  unnecessary).  With  >1(2)  denoting  the  z-lransform  of  a(n),  the  notation  >l(z)|j^  denotes 
the  r-transform  of  the  decimated  version  a{Mn).  Let  .4(2)  and  B{z)  be  rational  functions  and 
let  A'  and  L  be  integers.  The  following  identity  can  be  easily  verifed: 

3.  x{n)  *  g(n)  denotes  convolution  of  x(n)  with  g(n).  The  sequence  x(n)  *  g'[  —  n)  is  the  deter¬ 
ministic  cross  correlation  between  r(n)  and  g{n),  and  has  2-transform  X{z)G{2). 
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Polyphase  notation 

For  the  case  where  n;.  =  M  for  all  k,  the  system  of  Fig.  1.1(a)  can  be  redrawn  as  in  Fig.  1.2 
where  E(z)  and  R(^)  are  M  x  M  matrices.  Defining  the  analysis  and  synthesis  filter  vectors  as 

h(z)  =  [//o(‘)  ^i(-)  •••  =  [^o(^)  F,(z)  ...  (1.2) 

we  have 

h(r)  =  E(--^')e(z),  f^(z)  =  i(z)R(r^M,  (1.3) 

where  e(z)  is  the  delay  chain  vector,  i.e., 

e(r)  =  [l  r->  . . . •  (1.4) 

E(z)  and  R(z)  are,  respectively,  the  polyphase  matrices  of  the  analysis  and  synthesis  banks. 

2.  CONVOLUTION  THEOREMS  FOR  ORTHONORMAL 
FILTER  BANKS 

2.1.  Filter  bank  with  equal  decimation  ratio  in  all  branches 

First  consider  Fig.  1.1,  with  n*  =  M  for  all  k.  The  convolution  theorem  is  obtained  by 
analyzing  this  in  absence  of  the  quantizers  Qk-  Assume  that  the  set  of  filters  {Hk(z),  Fk{^)}  are 
chosen  to  satisfy  the  perfect  reconstruction  property,  i.e., 

A'(r)  =  .V(z),  d{z)  =  G{z).  (2.1) 

Using  the  fact  that  the  A/-fold  upsamplers  have  outputs  .\\.(z^^)  and  G*(i^^),  we  can  e-xpress  .V(z) 
as  ^k{^'^^)Fk{z),  and  similarly  for  5(z).  Using  these  together  with  (2.1)  we  obtain 

M-l  Af-l 

X{z)  =  y  Xk{z^^)Fk{  =  l  G{=)  =  Gkiz^^)Fk{z).  (2.2) 

kzzO  k=0 

Now  consider  the  quantity  X(z}G(z)  (with  the  ‘tilde’  notation  as  defined  at  the  end  of  Sec.  1).  We 
have 

A/-1  A/-1 

.V(.-)G(r)  =  ^'>:(~^')Gm(z-'^)Fk(z)f„iz).  (2.3) 

k  =  0  mrrO 
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The  inverse  ^-transform  of  A'(r)(5(r)  is  equal  to  the  convolution  of  i(n)  with  g’(-n)  (i.e.,  the 
determininstic  cross  correlation  between  x(n)  and  (/(«))•  Similarly  A'/t(r)G,„(z)  represents  the 
convolution  of  the  subband  signals  i/:(n)  and  I7m(  — ”)• 

Paraunitary  or  orthonormal  filter  banks. 

The  above  equation  reduces  to  a  much  simpler  form  (the  double  summation  reduces  to  a  single 
summation)  when  the  filter  bank  is  paraunitary  (r2]-(14).  In  this  case  the  polyphase  matrix  E(2) 
satisfies 

E(--)E(z)  =  I,  (2.4) 

and  we  choose  R(r)  =  E(z)  for  perfect  reconstruction  (so  that  R(r)  is  also  paraunitary).  In  this 
case  the  analysis  and  synthesis  filters  are  related  as  Fk(2)  =  that  is,  /*(n)  =  h.l{-n). 

In  order  to  ensure  that  Fk(x)  is  stable,  we  assume  that  the  analysis  filters  are  FIR.  Thus,  /i*(n) 
and  /*(n)  are  FIR  with  same  length.  A  paraunitary  filter  bank  satisfies  the  following  properties, 
regardless  of  the  exact  nature  of  if <;(€•’'*')  (i.e.,  regardless  of  filter  quality)  [12]. 

1.  The  energy  of  each  analysis  filter  equals  unity,  that  is  |iffc(e-’‘^)|'dw/2:r  =  1. 

2.  The  analysis  filters  are  power  complementary,  that  is, 

3.  Since  /fc(n)  =  hl(-n),  we  have  =  \Fk{e^'^)\-  So  the  above  two  properties  hold  for 

the  synthesis  filters  as  well. 

(Notice,  in  particular,  that  in  the  case  of  idea  brickwall  filters,  to  be  shown  later  in  Fig.  3.3,  the 
first  two  properties  are  evident.)  The  paraunitary  property  of  R(2)  is  equivalent  to  the  property 
that  the  synthesis  filters  satisfy  an  ortlionormality  condition  [10]-[12],  that  is, 

j;A(n)/;.(n  +  Ml)  =  6{k  -  m)6{i).  (2.5) 

n 

In  the  2-domain  this  can  be  rewritten  as 

[Fk{:)Fm{:))\^^^  =  6{k-m).  (2.6) 
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Simplification  of  the  convolution  formula 

Using  the  above  orthonormality  condition,  Eq.  (2.3)  leads  to 

M-l 

(^X(z)G{z)]^  =  ^  Xk(z)Gk{z).  (2.7) 

This  can  be  rewritten  in  the  time  domain  as 

A/-1 

(i(n)  ^  Xk{n)*gl{~n).  (0.8) 

A  =  0 

In  the  time  domain,  the  left  hand  side  represents  the  M-fold  decimated  version  of  the  convo¬ 
lution  of  i(n)  with  fl*(-n).  The  fcth  term  on  the  right  hand  side  represents  the  convolution  of  the 
subband  signal  Xk(n)  with  gl.{-n].  Summarizing,  we  have 

Theorem  2.1.  Paraunitnry  convolution  theorem.  Consider  the  two  copies  of  a  maximally 
decimated  filter  bank  as  in  Fig.  1.1,  with  FIR  analysis  and  synthesis  filters,  and  n*  =  M  for  all 
k.  Ignore  the  quantizers  Qk.  Assume  that  the  system  has  perfect  reconstruction  (i(n)  =  i(n)  for 
any  x(n))  and  that  the  polyphase  matrix  E(s)  (Fig.  1.2)  is  paraunitary  (equivalently  the  synthesis 
filters  are  orthonormal,  i.e.,  satisfy  (2.5)  or  equi%'aJently  (2.6)).  Then  the  A/-fold  decimated  version 
of  the  convolution  z(n)  *  g'i~n)  can  be  computed  by  computing  the  convolutions  ifc(n)  *  gli-n), 
0  <  k  <  M  -  I,  and  adding  them.  ^ 

In  order  to  obtain  all  the  samples  of  the  convolution  i(n)  *  g’{-n),  we  have  to  repeat  the 
above  operation  M  times,  by  replacing  g{n)  with  g{n  -  z),  for  0  <  z  <  A/  -  1.  We  can  represent 
these  operations  mathematically  as 

M-l 

(^z'X{z)G{z))  =  Y.  A\(-')Gi'’(;),  0  <  z  <  Af  -  1.  (2.9) 

fc=0 

where  Gjt''(z)  is  the  subband  signal  obtained  by  replacing  g{n)  with  g{n  -  i).  Assuming  that  i(n) 
is  an  input  sequence  and  that  g{n)  is  a  fixed  filler,  the  quantities  G*’(;)  are  fixed  (i.e.,  can  be 
precomputed). 

Application  in  decimation  filtering.  As  a  special  situation,  imagine  that  g‘{-n)  is  a 
decimation  filter  for  x(n).  This  means  that  the  result  of  convolution  x(n)  *  p’(-n)  is  decimated 
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by  some  factor  D.  In  this  case,  we  do  not  have  to  repeat  (2.9)  for  all  values  of  i.  For  example  if 
D  =  M/2,  we  only  have  to  perform  (2.9)  for  j  =  0  and  i  =  M /2. 

Comments  on  complexity.  Computational  complexity  is  not  the  main  advantage  of  the 
method  of  subband  convolution.  Assume  for  simplicity  that  x(n)  and  g(n)  are  A'-point  sequences. 
Then  direct  convolution  of  x{n)  and  g‘(-n)  (without  using  standard  fast  techniques)  requires  A’^ 
multipliers.  Assuming  that  N  is  much  larger  that  the  lengths  of  the  subband  filters  Hk(z)  (so  that 
the  multiplications  required  to  implement  analysis  filters  are  negligible)  the  signals  Xk(n)  and  gk(n) 
have  lengths  »s  NjM.  Each  subband  convolution  requires  nearly  (N/Aiy  multiplications,  so  that 
the  total  number  of  multiplications  for  all  the  M  values  of  t  in  (2.9)  is  nearly  N'^  again.  It  is  true 
that  we  can  employ  the  FFT,  or  even  the  fast  ‘short  convolution  algorithms’  in  the  subbands,  but 
again  this  is  not  the  main  point  of  the  discussion. 

The  above  reasoning  does  not  hold  if  the  analysis  filters  have  length  comparable  to  those  of 
i(n)  and  g{n).  In  this  case,  the  complexity  of  the  analysis  bank  becomes  comparable  to  the  direct 
convolution  of  i(n)  with  gin),  and  the  method  is  not  useful. 

The  actual  advantage  of  the  (paraunitary)  subband  convolver  is  that  it  allows  us  to  allocate  the 
computation'*’  accuracy  (i.e.,  bits)  among  the  subbands,  resulting  in  a  coding  gain  as  elaborated  in 
Sec.  3.  In  fact  considerable  coding  gain  can  be  obtained  even  in  the  special  case  where  the  analysis 
filters  have  small  length  (e.g.,  <  M),  as  discussed  in  Sections  4  and  5. 

2.2.  Orthonormal  filter  bank  with  unequal  decimation  ratios 

Now  consider  the  case  where  the  decimation  ratios  Uk  in  Fig.  1.1  are  possibly  unequal  integers 
such  that 

—  =  1.  (2.10) 

nk 

This  condition  implies  that  we  have  a  maximally  decimated  system.  The  design  of  such  systems 
has  received  attention  recently  [20],  [21].  Such  a  system  can  be  regarded  as  a  discrete  time  wavelet 
decomposition  system.  The  analysis  bank  is  the  ‘wavelet  transformer’  and  the  synthesis  bank  the 
inverse  transformer.  Assuming  perfect  reconstruction  (i.e.,  i(n)  =  i(n))  we  can  express  the  signal 
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x(n)  in  terms  of  the  synthesis  filters  Fk(~)  and  the  wavelet  coefficients  as  follows: 


A/-1 

A-{c)=  Y.  n.(r)A\.(.-"‘ 
*.=0 


(2.11) 


i.e.,  in  the  time  domain, 


Af-l 


x{n)  =  zi;  ik(()fk{n  -  n*/’). 


(2.12) 


*=0  t 


The  doubly  indexed  set  of  sequences 


UA^)  =  fk(r)  -  ntf) 


(2.13) 


are  therefore  the  ‘basis  functions’  for  the  expansion  of  a(n). 


Orthonormailty  (nonuniform  case).  The  above  basis  is  said  to  be  orthonormaJ  if 

X^^fc.f(n)^;,..(n)  =  -  m)6(e-  i).  (2.14) 

n 

In  terms  of  the  the  synthesis  filters,  the  orthonormality  property  is 

X^A(n)/A(»  +  -  Umi)  =  6{k  -  m)(5(f  -  i).  (2.15) 

n 

This  is  a  generalization  of  the  orthonormality  property  (2.5)  which  followed  earlier  from  parauni- 
tariness.  Let  denote  the  greatest  common  divisor  of  Tik  and  rim,  i.e.. 


—  §cd  (n^.n^.^).  (2.16) 

We  can  then  rewrite  (2.15)  as  [22] 

X^/A-(n)/;;(n  +  Tik.ruP)  =  -  m)6{p)  (2.17) 

n 

(see  Appendix  A).  In  the  r-domain  this  is  equivalent  to 

{^k{  =  )Fm{z))\^^  =6{k-Tn).  (2.18) 

A  simple  example  of  a  perfect  reconstruction  orthonormaJ  filter  bank  with  unequal  n*  is  obtained 
by  use  of  a  binary  tree  structure  with  paraunitary  polyphase  matrices  at  each  level  [10]-[12].  This 
results  in  filter  responses  that  have  an  octave  spacing. 
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Derivation  of  the  convolution  theorem  (nonuniform  case). 

Assume  that  we  have  perfect  reconstruction,  i.e.,  X(z)  =  A'(2)  and  G(z)  =  G(z).  Using  the 
expression  (2.11)  for  .V(;)  and  similarly  for  G(::),  we  have 

Af-l 

A'(r)G(r)=  ^  Ye  (2.19) 

m=0 

Let  L  be  the  least  common  multiple  of  the  decimation  ratios,  i.e., 


I  =  1cm  {nt}. 


For  0<k,m<M  —  \  we  then  have 


(2.20) 


L  —  ^kPk^  L  —  ^k,mPk,v 


(2.21) 


for  some  integers  pk  and  pk,m-  Consider  now  the  L-fold  decimated  version  of  X{z)G{z).  Using  the 
above  decomposition  of  L  and  the  identity  (1.1),  we  can  write 

M-l  Af-l 


(X(.)G(.))|^^=  ^  (  (n(r)F„(r))^^^^A\(.-^“/"‘-)G„(: 


(2.22) 


fcaO  m=0 

Using  the  orthonormality  property  (2.18)  this  simplifies  to 

M-l 

Ji.  ~ 


IPk.t 


fc=0 


M-l 


IPk 


(.V(..)G(.-))^^=  j;(A-,{.-)g,(..)) 
Equivalently,  in  the  time  domain 

(i(n)*<7'(-n)^  (i^cln)  •p;(-n)) 


(2.23o) 


*.=0 


Ipk 


(2.236) 


To  obtain  all  the  samples  of  the  convolution  i(n)  •  g‘(-n),  we  have  to  repeat  the  above  with  the 
shifted  versions  g(n  -  i),Q  <  i  <  L  -  1.  This  result  is  summarizecl  as  follows. 

Theorem  2.2.  Convolution  theorem  for  orthonormal  nonuniform  filter-banks.  Consider  the 
maximally  decimated  filter  bank  of  Fig.  1.1,  and  ignore  the  quantizers  Qk-  Let 


I  =  1cm  {71.},  Tit.m  =  gcd  (T?fc,n„,).  Pk  =  Llnk  and  pk,m  =  Lfuk.r 


(2.24) 
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Assume  that  the  system  has  perfect  reconstruction  (J(n)  =  x{n)  for  any  x{n))  and  that  the  synthesis 
filters  are  orthonormal,  i.e.,  satisfy  (2.17)  or  equivalently  (2.18).  Then  the  T-fold  decimated  version 
of  the  convolution  x{n)  *  g‘(  —  n)  can  be  computed  by  computing  the  pt-fold  decimated  versions  of 
the  convolutions  it(n)  *  gl(-n),  and  adding  them.  We  can  obtain  all  samples  of  the  convolution 
by  repeating  this  for  L  successively  shifted  versions  of  g(n).  0 

From  a  computational  complexity  view  point,  the  comments  following  eqn.  (2.9)  continue  to 
hold.  It  can  be  shown  that  the  number  of  multiplications  for  a  direct  convolution  x(n)  *  g'(-n}  are 
nearly  same  as  the  total  number  of  multiplications  required  to  perform  all  the  necessary  subband 
convolutions.  (This  neglects  the  multiplications  required  to  implement  the  analysis  filters  Hk{z) 
and  assumes  that  the  lengths  of  /ffc(-)  are  much  smaller  than  those  of  i(n)  and  gin)).  The  coding 
gain  of  the  nonuniform  orthonormal  convolver  will  be  derived  in  Sec.  3.3. 

3.  CODING  GAIN  OF  PARAUNITARY  CONVOLVERS 

Fig.  H  shows  the  paraunitary  convolver  with  quantizers  inserted  in  the  subbands  of  x{n).  We 
will  first  consider  the  uniform  case  (n*  =  M  for  all  k).  The  nonuniform  case  will  be  addressed  in 
Sec.  3.3.  Assume  that  gin)  is  a  fixed  filter  with  no  quantizers  in  its  subbands  (This  assumption 
can  be  removed,  but  only  with  considerable  loss  of  simplicity  of  mathematics).  For  simplicity  of 
analysis  we  assume  that  xin),gin)  and  the  filter  coefficients  in  ^ki-)  are  real  so  that  i*fn)  and 
gkin)  are  is  real.  This  enables  us  to  deal  with  quantizers  that  operate  on  real  inputs. 

Let  6fc  denote  the  number  of  bits  per  sample  of  x>(n),  permitted  by  the  quantizer  Qk-  Thus 
the  average  bit  rate  is 

..  Af-l 

(3-1) 

k=0 

i.e.,  on  the  average,  we  have  used  b  bits  per  sample  of  x{n). 

Because  of  the  quantization  in  the  subbands,  the  output  of  the  paraunitary  convolver  is  different 
from  the  ideal  result  i(n)  *  9~(  —  n).  To  analyze  this  error,  we  replace  the  quantizers  Qk  with  the 
noise  sources  Qkin)  as  shown  in  Fig.  3.1.  Consider  the  paraunitary  convolution  formula  (2.8).  In 
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the  presence  of  quantizers,  we  are  actually  computing 

+ qk(n)^  *  g'ki-n).  (3.2) 

fcsO 

(According  to  the  reaJness  assumption  the  conjugate  sign  is  redundant,  but  we  show  it  for  consis¬ 
tency  with  previous  sections).  The  quantization  error  is  therefore 

M-l 

9(”)  =  (3.3) 

fc=0 

The  noise  model 

To  perform  a  statistical  analysis,  we  will  make  the  following  assumptions: 

1.  x{n)  is  a  zero-mean  wide  sense  stationary  random  process  so  that  the  subband  signals  Xfc(n) 
are  zero-mean  WSS  with  some  variance,  say,  c].^.  We  consider  g{n)  to  be  a  determininistic 
sequence. 

2.  The  quantization  error  c/fcfn)  is  zero-mean  and  white,  with  variance  Also  qk{n)  is  uncor¬ 
related  to  qm(i),k  ^  m,  and  to  the  input  x{n)  (hence  to  the  quantizer  input  ik(n)). 

It  should  be  noticed  that  the  above  assumptions  are  reasonable  as  long  as  the  bit  rates  bk  are 
moderate  or  high  [23],  In  any  case,  in  the  absence  of  such  assumptions,  it  is  not  usually  possible 
to  find  an  expression  for  error  variance. 

3.1.  Expression  for  the  error  variance 

Let  denote  the  variance  of  ifc(n),  and  cr-^  the  variance  of  the  quantizer  error  9fc(n).  In 
order  to  equalize  the  overflow  probabilty  across  all  the  M  subbands,  these  two  should  be  related  as 


*> 


(3.4) 


(See,  e.g..  Chap.  4  of  [16]).  Here  c  is  a  constant,  identical  for  all  subbands  (which  is  a  valid 
assumption  if  all  Xk{n)  have  similar  type  of  distribution,  e.g..  all  Gaussian). 
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Using  (3.3)  and  the  noise  model  assumptions  stated  earlier,  the  variance  of  ^(n)  can  be  ex¬ 
pressed  as 

^l(n)  =  Z  Z 

'  (3-5) 

=  c 

k=0  t 

This  is  for  z  =  0  in  (2.9).  For  arbitrary  i,  the  filter  g(n)  is  replaced  with  g(n  -  i),  and  the  above 


equation  is  modified  to 


(3.6) 


ksO 


where  5[.'\n)  is  the  A:th  subband  output  in  response  to  g{n  -  i).  The  dependence  on  i  is  removed 


by  averaging  over  all  i.  The  resulting  average  variance  of  g(n)  is  given  by 

A/-1 


(3.7) 


fc=0 


where 


“*=  Z  (3.8) 

i=0  t 

The  inner  summation  above  represents  the  energy  in  the  /:th  subband  in  response  to  the  input 
g{n  -  :').  The  outer  summation  removes  the  dependence  on  i.  Thus  oi\  is  proportional  to  the 
average  energ}'  of  g{n)  in  the  l:th  subband.  Using  the  paraunitary  property,  it  can  be  shown  that 
Hk^'k!^  is  the  total  energy  in  g{n)  (see  eqn.  (3.22)  later). 

The  ‘Ff/’  in  the  subscript  in  (3.7)  is  a  reminder  of  ‘paraunitary’.  Equation  (3.7)  gives  the 
average  error  variance  (over  a  period  of  length  M),  and  is  independent  of  time. 


3.2.  Coding  gain  of  the  paraunitary  convolver 

Now  consider  direct  convolution  x{n)  *  g'(-n).  Suppose  x(n)  is  directly  quantized  to  6  bits 
before  convolution.  Denoting  e(Ti)  as  the  quantization  error,  the  result  of  quantization  is  [i(n)  -f 
e(n)]*5r’(-n)  so  that  the  error  is  e{n)*g~(  —  n).  Under  usual  noise  model  assumptions,  the  variance 
of  this  error  is 

(3.9) 


13 


AFIT/AFOSR  Wavelets  Workshop  215 


where  c]  is  the  variance  of  e(n),  which  can  be  expressed,  similar  to  (3.4),  as  =  ccrj2~^*,  where 


o\  is  the  variance  of  x(n).  Thus 


(3.10) 


The  ratio 


Gpu{M)  - 


^q.PCM 


(3.11) 


is  the  coding  gain  of  the  paraunitary  convolver.  The  argument  M  is  a  reminder  that  there  are  M 


subbands  in  the  system.  Substituting  from  (3.7)  and  (3.10),  this  becomes 


Gpu(^n  = 


2--V-EJg(MI- 


(3.12) 


In  this  expression,  <7'^  is  the  variance  of  the  /:th  subband  signal  derived  from  the  input  i(n),  and 
ckJ.  >  0  is  related  to  the  k  subband  of  the  filter  g(n).  And  b  is  the  average  bit  rate  (3.1).  Notice 
that  and  qj  depend  on  the  analysis  filter  response  Hk(e^'^). 


Optimum  bit  allocation 

Under  the  average  bit-rate  constraint  (3.1),  we  can  maximize  the  coding  gain  by  optimally 
allocating  the  bits  bk  among  subbands.  The  idea  is  very  similar  to  the  counterpart  in  subband 
coding  [16].  For  this  we  note  that  the  numerator  of  (3.12)  is  independent  of  the  bit-allocation. 
We  OTi’.j  have  to  miminize  the  denominator.  For  tliis  we  invoke  the  arithmetic-geometric  mean 
inequality  [24]  (AM-GM  inequality)  which  says  this:  if  <  fc  <  M  -  1  is  a  set  of  nonnegative 


numbers,  then 


Af-l  M-\ 


(3.13) 


with  equality  if  and  only  Pk  =  P  for  all  k.  Using  this  in  conjunction  with  (3  1)  we  can  show  that 


(3.14) 


with  equality  if  and  only  if  all  terms  on  the  left  side  above  are  equal.  Since  the  quantizer  variances 

are  given  by  (3.4),  we  see  that  the  above  condition  for  equality  implies 

1  constant 

O'  =  CCTl.  2  '”*  =  - - 
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The  output  noise  variance  due  to  the  klh  quantizer  (^th  term  in  (3.7))  is  therefore  independent  of 
k. 

We  obtain  the  formula  for  the  optimal  bit  allocation  by  setting  all  the  terms  on  the  left  side 
of  (3.14)  to  be  equal.  The  result  is 

lik  =  C  +  ^\og2(cl,Ql).  (3.15) 

for  some  C.  By  using  (3.1)  we  can  eliminate  C  and  obtain 

bk  = />  +  0.51og,(iri^ai)-  ^  log,  oj)  .  (3.16) 

*:  =  0 

This  is  very  similar  to  the  e.xpressions  which  can  be  found  in  [16],  [17]  for  traditional  subband 
coding  systems.  The  difference  is  that  the  product  crl^o].  appears  in  the  place  of  £r*^ .  Thus,  the 
energy  of  the  signal  as  well  as  the  filter  g(n)  in  the  /:th  subband  determine  the  bits  6*.  For  high 
bit  rate  coding,  the  above  expression  is  useful.  As  in  subband  coding,  b^  might  turn  out  to  be  non 
integral,  and  sometimes  negative  if  b  is  not  large  enough. 


Optimum  coding  gain.  The  optimum  coding  gain  is  obtained  when  equality  holds  in  (3.14), 
i.e.,  when  all  the  terms  on  the  left  side  of  (3.14)  are  equal.  This  optimum  value  is 


Gpu. 


optimal 


(M)  = 


En  IP(")I^ 


1/M 


l/M 


(3.17) 


Notice  that  the  above  analysis  holds  for  any  filter-bank  convolver  with  paraunitary  polyphase 
matrix,  regardless  of  the  the  quality  of  the  filter  responses.  The  filter  responses  will  in  turn  deter¬ 
mine  the  values  of  and  qJ.  for  fixed  g{n)  and  a:(n). 


Lemma  3.1.  G pu.optimaii^^)  >  1  regardless  of  the  choice  of  paraunitary  filters  ^f*(z).  0 

Proof.  We  will  rewrite  the  optimal  coding  gain  (3.17)  by  expressing  crl  in  terms  of  and 
En  l5(")l'  in  term  of  qJ.. 

The  variance  of  the  output  of  ^fc(-)  is  also  the  variance  of  the  decimated  subband  signal  i*(n) 
so  that 

<  =  ^l'  S,,(e^-)\Hkien\-d^-  (3.18) 
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where  5xx(e-’‘*')  is  the  power  spectral  density  of  i(n).  The  paraunitary  property  E(z)E(z)  =  I 
implies  |/ric(e-’‘^)!'  =  M.  By  computing  (318)  we  therefore  obtain 

(3.19) 

fc=0 


=  S(s‘'^(")]^g''^(n)-  (3.21) 

n  n 

The  left  hand  side  is  the  energy  |i/(J»)|'-  Combining  this  with  the  definition  (3.8)  of  oj,  we 
obtain 

Te^9{n)\‘  =  ~'£al.  (3.22) 

n  ^-sO 

Substituting  (3.19)  and  (3.22)  into  (3.17),  we  arrive  at 


Gpu  .Optima/  (A/)  = 


M  Z^t=0  Pj 

(nr=v<)’ 


(3.23) 


Using  the  arithmetic-geometric  mean  inequality  we  conclude  that  G pu.optxmaiiM)  >1.  W  V 


Notice  that  the  above  proof  uses  the  paraunitary  property.  The  property  G pu,opt,mai{M)  >  1 
cannot  be  claimed  for  a  convolver  based  on  an  arbitrary  filter  bank  (i.e.,  without  paraunitary 
property).  The  appearance  of  the  arithmetic-geometric  mean  ratio  in  the  coding  gain  has  been 
observed  in  other  contexts  in  traditional  subband  coding  applications.  It  has  been  formally  proved 
for  the  case  of  ideal  brickwall  filters  and  for  the  ca.se  of  orthogonal  transform  coding  [16].  Such  an 
expression  has  also  been  used  for  other  types  of  (non  ideal)  filter  banks  [25].  The  true  justification 
for  such  use  is  based  on  the  paraunitary  property,  as  shown  above  and  in  [26]. 
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In  general  the  gain  can  exceed  unity  for  two  possible  reasons.  First  the  subband  variances 
could  be  different  for  different  k.  And  second,  the  quantity  may  not  be  the  same  in  all  subbands. 


Special  cases. 

Paraunitary  filler  banks  are  special  cases  of  a  more  general  class  of  perfect  reconstruction  filter 
banks  [2], [3].  However,  they  cover  a  wide  range  of  practical  filter  banks.  In  fact,  some  of  the 
approximate  reconstruction  systems  (%’iz.,  the  pseudo  QMF  banks  [27]-[30])  are  known  to  satisfy 
the  paraunitary  property  ‘approximately’  (see  [31]),  even  though  these  approximate  systems  were 
developed  before  paraunitary  filler  banks  were  reported. 

1.  A  special  case  of  paraunitary  systems,  primarily  of  theoretical  interest,  arises  when  the  fillers 
are  equispaced  ideal  brickwaJl  filters  as  shown  in  Fig.  3.3.  In  this  case 


_  /  v/M  if  u)  €  kih  passband 


0  otherwise, 


(3.24) 


and  it  can  be  shown  that  E(e^‘*’)  is  paraunitary  (see  Sec.  6.2.2  of  [12]).  In  this  case,  we  have 


fcth  band 


5„(e^'^)(iw/27r 


(3.25) 


where  5rr{e'''‘')  is  the  power  spectrum  of  x(n).  Thus,  the  coding  gain  is  greater  than  unity  as 
long  as  z(n}  does  not  have  same  ‘variance’  in  all  the  consecutive  frequency  bands.  The  system 
(3.24)  will  be  called  the  ideal  SBC  (subband  coding)  convolver. 

2.  A  second  special  ca.se  of  theoretical  interest  arises  when  for  all  k.  In  tliis  case 

the  above  results  still  hold  (since  E(z)  =  I  which  is  paraunitary);  and  the  coding  gain  can  be 
verified  to  be  unity. 

3.  Case  of  white  input.  Now  consider  the  special  case  where  i(n)  is  zero-mean  and  white.  Let 


the  response  of  the  filters  be  arbitrary  except  for  the  FIR  paraunitary  property.  Then 

is  identical  for  all  k.  This  follows  because  paraunitariness  implies  in  particular,  that  the 
energy  |//fc(e^'*')|-cL-72"  is  identical  for  all  k  (.A.ppendix  B).  In  this  case,  the  coding  gain 
can  still  exceed  unity,  because  oj.  may  not  be  identical  for  all  k. 
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3.3.  Coding  gain  for  the  nonuniform  orthonormal  convolver 


In  the  nonuniform  case.  Eqn.  (2.23)  gives  the  L-fold  decimated  version  of  the  convolution.  To 
obtain  all  samples  of  the  convolution,  we  repeat  this  operation  with  g(n)  replaced  by  g(n  -  z),  i.e., 
gk{n)  replaced  by  g['\fi)  for  0  <  t  <  T  —  1. 

With  quantizers  inserted  as  in  Fig.  1.1(a),  we  can  replace  them  with  noise  sources  qk(n)  as  in 
Fig.  3.1.  With  x(n)  and  g(n  -  i)  used  as  the  filter  bank  inputs,  the  error  in  the  computation  of 
[5* '(-n)]'  is  therefore  XltiV  ‘y'-l")*  bl'V"”))’-  Proceeding  as  before,  we  find  the  variance 
of  this  error  to  be 

Z  (3.26) 

/.■=0  n 


Averapng  over  the  L  values  of  t,  we  obtain  the  average  variance  of  the  error  9(ti)  in  the  convolution 
as 


A/-1 


A^-l 


■j-L  ~  _/i/  -  M  ^  ^  .  ** 


(3.27) 


ksO  '  ksO 

The  subscript  i.  stands  for  ‘orthonormal’  filter  banks.  This  is  the  ‘output  error  variance’  of  the 
convolver.  Here  we  have  used  (3.4).  .A.lso,  we  have  defined 


0 


1=0  n 


(3.28) 


The  bit  rate  for  the  l:th  subband  is  hfc/nt.  Assume  that  the  total  bit  rate  is  constrained  to  be 
b.  Then  the  bit  rate  constraint  is 


Af-l  , 

y 

h, 


(3.29) 


To  obtain  the  optimal  bit  allocation,  we  can  minimize  a- ji  under  the  above  constraint  by  use  of 
the  Lagrange  multiplier  method.  That  is,  form  the  Lagrangian  d  =  a'  j_  —  A(^^p^  ^  ~ 
set  ddidbk  =  0.  This  results  in  the  set  of  equations 
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where  Z?  is  a  constant  independent  of  k.  ^  Taking  logarithm  and  using  (3.29),  we  can  evaluate  the 
constant  D  to  be 


D  = 


*y2b 


n.=o 

Substituting  this  into  (3.30)  and  taking  logarithm,  we  obtain  the  following  formula: 


(3.31) 


bk  =  b  +  0.51og2(n*ffjjQt)  -  0.5  ^ 


logs  (n, 0-2  q2) 


.=0 


"i 


(3.32) 


for  optimal  bit  allocation.  Under  this  condition,  the  variance  of  the  tth  quantizer  noise  is  given  by 

l/n, 

(3.33) 


CT-  =  c2  -‘’‘ct;  = 


9> 


•'*  Daluk  Qirik 

which  is  proportional  to  l/(Q|.n(..).  With  optimum  bit  allocation,  the  output  noise  variance  con¬ 
tributed  by  the  Z'th  quantizer  (Jcth  term  in  (3.27))  simplifies  to  c/(DMnk),  and  is  proportional  to 
l/n*.  The  the  total  output  noise  variance  is 

= -TO  ii  1/"*  = 


c 

'dm 


,  N>/n. 


M 


n  (-^.“N.)  ' 


(3.34) 


*.•=0  '=0 
The  coding  gain,  defined  as  G±{M)  =  cr~  pcMl^l,i.  calculated.  Thus,  using  (3.10)  and 

(3.34)  we  obtain 


G±(M)  = 


c^lZn\9in)\- 


l/n, 


(3.35) 


Notice  that  these  results  reduce  to  those  in  Sec.  3.2  if  we  set  rik  =  M  for  all  k.  Another  special  case 
of  interest  in  many  applications  (speech  and  image  coding)  is  the  filter  bank  with  analysis  filter 
responses  as  in  Fig.  3.4.  The  responses  have  an  octave  spacing  and  correspondingly  increasing 
bandwidths  (constant  Q  filter  bank).  Such  a  system  can  be  generated  by  use  of  a  tree-structured 
system,  where  one  of  the  two  signals  from  the  previous  stage  is  further  split  into  two  in  the  next 


^  The  fact  that  this  represents  a  minimum  rather  than  maximum  can  be  verified  in  many  ways. 
For  example  one  can  verify  in  this  case  that  the  Hessian  of  the  Lagrangian  [32]  is  a  diagonal  matrix 
with  positive  elements. 
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Stage  [10],  and  so  forth.  The  orthonormaJity  property  can  be  satisfied  in  such  a  system  by  use  of 
2x2  paraunitary  polyphase  matrices  at  each  level  of  the  tree.  The  above  theory  can  be  applied 
for  these  systems,  with 

nQ  =  ni  =  2^2  =  =  •  •  • 


3.4.  The  special  case  of  traditional  subband  coding 


The  results  derived  above  for  the  paraunitary  convolvers  (uniform  as  well  as  nonuniform)  can  be 
used  to  derive  the  optimal  bit  allocation  and  coding  gain  for  orthonormal  subband  coding  systems, 
i.e.,  systems  of  tlie  form  in  Fig.  1.1(a).  This  is  done  by  setting  p(n)  =  ^(n).  Under  this  condition, 
the  quantity  is  the  decimated  version  of  hki'n-i).  where  hk{n)  is  the  impulse  response  of  the 

analysis  filter  Using  the  fact  that  the  analysis  filters  have  unit  energy  under  the  paraunitary 

constraint,  one  can  verify  that  oj  =  M/tik-  Substituting  this  we  obtain  the  reconstruction  error 
variance,  i.e,,  variance  of  i(n)  -  x(n)  in  Fig.  1.1(a).  This  can  be  obtained  from  (3.27)  as 


A/-1  0 

^  rik 

*=o  '• 

The  optimal  bit  allocation  rule  reduces  to 

logo(a^  ) 

bk  =  b  +  0.51og2(ff;^ )  -  0..5 

t=0  ' 

and  the  optimized  coding  gain  becomes 

O 


(3.36) 


(3.37) 


(3.38) 


Since  al  =  M/rik  in  this  case,  we  see  from  (3.33)  that  the  variance  of  the  kth  quantizer  noise 
is  independent  of  k,  and  is  given  by 


Af-l 


(3.39) 


1=0 


The  contribution  to  the  output  noise  variance  j^,  coming  from  the  kth  quantizer  (fcth  term  in 
(3.36),  is  proportional  to  l/ru-. 
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Summarizing,  the  above  expressions  are  applicable  to  any  sub  band  coder  (possibly  unequal 
decimation  ratios,  but  maximally  decimated)  with  orthonormaJ  filters,  under  the  noise  model  as¬ 
sumptions  slated  at  the  beginning  of  Sec.  3.  The  further  special  case  where  n*  =  M  has  been  re¬ 
ported  in  many  references  in  the  past  [16),[17),[25].  All  these  references  assume  ideal  nonoverlapping 
subband  filters,  but  that  assumption  is  not  necessary  as  the  above  analysis  shows;  orthonormality 
(paraunilariness  in  the  uniform  case)  is  really  sufficient. 

4.  GENERAL  ORTHOGONAL  TRANSFORM  CONVOLVER 

The  optimal  coding  gain  (3.23)  depends  on  the  choice  of  the  paraunitary  matrix  E(^).  A 
natural  problem  of  interest  here  is  tlie  choice  of  optimal  paraunitary  E{r)  of  a  given  degree  J  (for 
fixed  number  of  channels  A/)  which  further  maximizes  the  coding  gain.  In  general  this  is  a  difficult 
problem,  although  some  progress  can  be  made  in  the  special  case  where  J  =  0,  i.e.,  Efz)  is  a 
constant  unitary  matrix  T.  This  is  shown  in  Fig.  4.1(a).  We  will  now  consider  the  optimization 
problem  for  this  special  case.  This  special  case  is  particularly  attractive  because  the  analysis  filters 
Hk{2)  have  length  <  M  (which  could  be  much  smaller  that  the  lengths  of  .T(n)  and  g{n)).  In  this 
case  the  complexity  of  implementing  the  analysis  and  synthesis  filters  is  negligible  (compared  to 
the  complexity  of  the  convolutions  .Tfc(n)  *  </t(-n)),  and  can  therefore  be  disregarded.  However, 
significant  coding  gain  can  still  be  achieved,  as  we  will  demonstrate  later. 

With  T  taken  to  be  unitary,  i.e.,  T^T  —  I,  the  system  is  a  paraunilary  perfect  reconstruction 
filter  bank  [2].  This  is  similar  to  the  orthogonal  transform  coding  system  [16].  The  convolution 
theorem  (Theorem  2.1)  clearly  continues  to  hold  in  this  case,  and  so  do  the  coding  gain  expressions 
of  the  previous  section.  We  will  now  address  the  problem  of  finding  the  optimal  T  that  maximizes 
the  coding  gain  (3.17)  under  optimal  bit  allocation.  It  will  again  be  assumed  that  the  signals 
x{n),g(n)  and  the  matrix  T  are  real.  We  will  first  simplify  the  expression  (3.17)  by  writing 
and  a\  directly  in  terms  of  T. 

Expressions  for  cr].^  and  a]. 
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First  refer  to  Fig.  4.1(a).  Define  tl»e  vectors  x(n)  and  x{u)  as 


x{Mn) 

■  2:o(«)  ■ 

II 

x{A[n  —  1 ) 

,  X(7))  = 

ii(r2) 

.x{AIn  -  M 

H 

> 

Then  x(n)  =  Tx(n).  Assuming  that  3(n)  is  VVSS,  the  vector  processes  x(n)  and  x(n)  are  VVSS. 
Define  the  autocorrelations 

Rxx  =  ^^[^^(iilx^in)],  and  R^x  =  £’[x(Ti)x^(n)].  (4.2) 


Then 


Rxx  =  TRxxT^ 


(4.3) 


The  quantity  is  the  diagonal  element  [Rxxjfc*.-  so  that  the  product  of  these  (which  appears  in 
the  denominator  of  (3.17))  is  given  by 

fTRxxT^ 

kk 


M-l  M-l 

n  d  =  n  (TftxxTt)^, 


(4.4) 


fe=0  k=0 

Next  refer  to  Fig.  4.1(b).  Define  the  vectors  g^'^n)  and  g*''(n)  as  in  (3.20).  We  then  have 
g<'l(Ti)  =  Tg^'*(n).  Thus 

Af-l 


a\  =  ^  ^ (g*'*(”)[g^’*(”))^)  (from  the  definition  (3.8)) 

issO  n 


(4.5) 


1  =  0  n 


=  (tB.s*t') 


kk 


where 


M-\ 

Rss=  E 

1  =  0  n 

Summarizing,  the  coding  gain  (3.17)  can  be  expressed  as 

4Znlg(n)|- 


(4.6) 


O'rci-^n  = 


niio'  (TB-xxTt)^^  n;'Lo'  (TR^gXi) 


l/M 


(4.7) 


kk 
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The  subscript  TC  stands  for  “transforrn  coding’.  The  expression  (4.7)  holds  under  the  optimal  bit 
allocation  condition  (3.16).  The  unitary  matrix  T  should  be  chosen  so  as  to  minimize  the  product 
in  the  oenominaior. 

Properties  of  the  matrices  Rxx  and  Rgg.  The  M  x  M  matrix  Rxx  is  the  autocorrelation 
matrix  derived  from  a  scalar  WSS  process  2‘(«),  and  is  therefore  Hermitian,  Toeplitz,  and  positive 
semidefinite.  It  is  also  positive  definite  unless  i(n)  is  harmonic  (i.e.,  the  powt  spectrum  is  made 
of  impulses  6{u  -  u;^)).  It  can  be  shown  that  Rgg  also  has  all  these  properties,  i.e.,  Hermitian, 
Toeplitz  and  positive  definite  unless  6’(e-’‘*')  is  made  of  impulses.  (See  Appendix  C).  In  (act  it  turns 
out  that  [Rgg]i.m  =  ^nl/(")f/'("  +  I'  —  "0)  so  that  it  ;.5  a  delermininstic  autocorrelation  matrix. 

The  problem  of  finding  the  optimal  transformation  T  therefore  reduces  to  the  following;  given 
the  M  X  M  Hermitia.n.  Toeplitz  and  positive  definite  matrices  Rxx  and  Rgg,  find  a  unitary  matrix 
T  such  that 

n(TR„Tt)  n(TR„Tt)^^  (.,8) 

ksO  k=0 

is  minimized. 

Given  a  Hermitian  positive  definite  matrix  P,  consider  the  product  ritio' 
is  constrained  to  be  unitary.  It  is  known  that  this  product  is  minimized  if  and  only  if  the  columns  of 
are  eigenvectors  of  P.  (Tltis  i;  how  the  traditional  Karhunen-Loeve  transform  (KLT)  is  obtained 
[16]).  Under  this  condition  TPT^  is  diagonal.  However,  in  our  case,  two  positive  definite  matrices 
are  involved.  rroolem  of  finding  a  single  unitary  matrix  T  that  minimizes  the  product  (4.8) 
does  not  appear  to  have  a  simple,  known,  solution. 

If  the  matrices  Rxx  a-ud  Rgg  are  diagonalizable  by  the  same  unitary  matrix  T.  then  this  T 
maximizes  the  coding  gain.  This  condition  for  simultaneous  diagonalization  is  equivalent  to  either 
of  the  following  two  conditions  [33]: 

1.  Rxx  and  Rgg  cc  imute.  i.e..  RxxRgg  =  Rgg  Rxx- 

2.  RxxRgg  is  Hermitian. 

For  the  special  case  of  2  x  2  real  matrices  (i.e.,  M  =  2.  and  x{n].g{n)  and  T  are  real),  the  above 


23 


AFIT/AFOSR  Wavelets  Workshop  225 


conditions  are  satisfied  for  the  following  reason:  The  matrices  Rxx  and  Rgg  are  2x2  symmetric 
Toeplitz,  so  that  they  are  also  circulant.  But  circulant  matrices  commute  [34].  The  two  matrices 
are  simultaneously  diagonalizable  by  the  unitary  matrix 


T 


v/2 


1  1 
-1  I 


(4.9) 


With  tills  choice  of  T  the  coding  gain  reduces  to 


<jtc(2)  =  -  (4.10) 

s/d  -  pDH  -  pD 

where  Pr  =  £[x(n)2*(n  -  l)]/<7;,  and  Pg  -  E„‘7(”)i'*(”  “  1)/En  IPt")!'-  For  example,  if  p,  = 
Pg  =  0.96  then  the  coding  gain  is  Otc(2)  =  10.26. 


Bound  on  the  coding  gain. 

For  a  Hermitian  positive  definite  matrix  P,  we  have  [P]„  >  det  P  with  equality  if  and  only  if 
P  is  diagonal.  Using  this  we  see  that  the  gain  (4.7)  is  bounded  as 


([det  Rxx][del  R^^j)  '  (det  [R^xR^p]) 


l/M 


(4.11) 


5.  NUMERICAL  EXAMPLES 

In  the  foUowing  examples,  we  will  demonstrate  the  coding  gains  of  the  paraunitary  convolvers. 
The  signals  x{n),g{n),  and  the  number  of  subbands  M  are  chosen  as  follows: 

1.  .Number  of  subbands  M  =  6  in  ail  cases. 

2.  .Many  choices  of  g{n)  are  used,  but  ail  of  these  are  such  that  is  lowpass  as  demonstrated 

in  Fig.  5.1.  .411  choices  have  the  same  bandedges.  To  obtain  different  stopband  attenuations, 
we  change  the  length  of  <7(n),  but  retain  the  same  band  edges  for  G(e'’"). 

3.  The  input  signal  x{n)  is  taken  to  be  an  autoregressive  process  of  order  five  [i.e.,  an  /4f?(5) 

process].  Thf  autocorrelation  coefficients  Rik),  for  0  <  A-  <  5.  are  obtained  from  Table  2.2  of 
[16]  (lowpass  speech  source).  Where  necessary,  the  power  spectrum  is  computed  as 
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Six(e-’’*')  =  o/|l  +  12n=i  where  a„  are  the  autoregressive  coefficients  (obtainable  by 

solving  the  optimal  fifth  order  linear-prediction  problem  [16]). 

Fig.  5.2  shows  the  coding  gain  of  the  paraunitary  convolver  (with  optimal  bit  allocation)  as  a 
function  of  the  stop  band  attenuation  of  G(e-'“'),  for  three  cases.  The  topmost  curve  corresponds  to 
the  ideal  SBC  convolver.  In  other  words,  the  analysis  and  synthesis  filters  are  as  in  Fig.  3.3  (ideal 
brickwall  filters  (Eq.  (3.24)).  The  bottom  curve  is  for  the  DCT  convolver,  that  is  an  orthogonal 
transform  convolver  (Fig.  4.1)  in  which  the  matrix  T  is  taken  to  be  the  6x6  DCT  matrix.  (Four 
types  of  DCT  matrix  have  been  defined  in  the  literature;  we  have  use  '■  the  one  in  Eq.  (12.157)  of 
[16].)  t  The  middle  curve  shows  the  ujiper  bound  (4.11 )  for  the  orthogonal  transform  convolver.  It 
is  interesting  to  note  that  the  DCT  .system  is  only  about  0.5  dB  worse  than  the  bound.  The  ideal 
brickwall  SBC  convolver  is  about  2.5  dB  better  than  the  DCT  convolver.  The  DCT  convolver, 
however,  is  very  simple  to  implement  (less  expensive  than  good  filters  approximating  the  ideal  SBC 
filters).  In  all  the  above  cases  the  coding  gain  improves  with  the  attenuation  of  because  the 

AM fGM  ratio  in  Eq.  (3.17)  improves. 

In  the  above  experiment  suppose  we  take  <7(n)  =  ^(n).  Then  the  coding  gain  of  the  convolver 
is  equal  to  the  coding  gain  of  the  traditional  subband  coding  system.  For  the  ideal  SBC  filters, 
this  value  is  G  =  6.72  dB,  and  for  transform  coding  with  DCT  this  is  5.3  dB  (consistent  with 
experiments  on  speech  coding;  for  example,  see  page  -542  of  [16]).  Thus,  the  additional  gain  seen 
in  Fig.  5.2  is  contributed  by  the  filter  G{e^'^)  parti :ipating  in  the  subband  convolver. 

We  have  not  shown  plots  of  the  coding  gain  with  respect  to  the  number  of  channels  M,  as  it 
does  not  reveal  more  insights  than  what  is  already  known  in  subband  coding  practice  [16], [36]. 

6.  CONCLUDING  REMARKS 

In  this  paper  we  have  introduced  the  convolution  theorems  for  filter  bank  transfomers.  Both 

i  The  motivation  for  the  use  of  the  DCT  is  that,  in  traditional  speech  coding,  it  is  known  to  be 
an  excellent  substitute  for  the  optimal  (KLT)  transform. 
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uniform  and  nonuniform  decimation  ratios  were  considered,  and  the  theorems  simplified  for  the 
case  of  paraunitary  and  orthonormal  convolvers.  Expressions  for  optimal  bit  allocation  and  the 
optimized  coding  gain  were  derived,  and  numerically  demonstrated.  The  contribution  to  coding 
gain  comes  partly  from  the  nonuniforinity  of  the  signal  spectrum  and  partly  from  nonuni¬ 

formity  of  the  filter  spectrum  |C(e-'“')|-.  Thus,  even  if  .v(n)  is  nearly  white,  the  coding  gain  can  be 
large.  With  g(n)  taken  to  be  the  unit  pulse  function  Hn),  the  coding  gain  expressions  reduce  to 
those  for  traditional  subband  and  transform  coding,  many  of  which  are  well-known. 

The  paraunitary  (more  generally  orthonormal)  convolver  has  about  the  same  computational 
complexity  as  a  traditional  convolver,  if  the  analysis  bank  has  small  comple,xity  compared  to  the 
convolution  itself.  Such,  indeed,  is  the  case  in  the  special  case  of  the  orthogonal  transform  convolver 
(Fig.  4.1)  where  the  analysis  filter  bank  has  filter  lengths  <  M  (number  of  bands).  In  spite  of  this 
simplicity,  the  coding  gain  obtainable  can  already  be  quite  significant.  Even  though  there  is  no 
closed  form  expression  for  the  optimal  orthogonal  convolver  matrix  T,  we  could  derive  an  upper 
bound  for  this  (for  fixed  Af),  and  the  DCT  matrix  offers  a  gain  very  close  to  this  bound  for  the 
case  of  speech  signals. 

Appendix  A 

If  k  =  m  we  can  rewrite  (2.15)  as  +  ”t-(^  -  0)  =  <5(f  -  i)  and  (2.17)  as 

^„/fc(n)A*(n  +  TZfcp)  =  S(p).  Evidently  these  imply  each  other. 

Next  let  k  ^  m.  First  assume  that  (2.15)  holds.  Recall  =  gcd(?ii:,nm).  Thus,  there  exist 
integers  a  and  6  such  that  n^a  -  7i„,0  =  n^.m-  Therefore,  given  any  integer  p  there  exists  integers 
C  and  i  such  that  UkC  -  Umi  =  nk,mP-  Thus  the  left  hand  side  in  (2.17)  can  always  be  rewritten 
to  resemble  the  left  hand  side  of  (2.15).  Since  k  ^  ni,  this  left  hand  side  is  indeed  zero,  so  that 
the  left  hand  side  of  (2.17)  is  zero  as  well.  Conversely  let  (2.17)  be  true.  Given  a  pair  of  integers 
C,i  we  can  always  write  -  UmJ  =  ^k.mP  for  some  integer  p.  So  the  left  side  of  (2.15)  can  be 
rewritten  to  resemble  the  left  side  of  (2.17).  .Since  k  ^  »n,  (2.17)  says  that  this  is  zero,  so  that  the 
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same  follows  for  (2.15). 


Appendix  B 

If  the  filter  bank  in  Fig.  1.1(a)  (with  nt  =  M  for  all  k)  is  paraunitary,  then  the  matrix  R(2) 
(Fig.  1.2)  is.  in  particular,  paraunitary.  This  imples  (2.5)  which  in  turn  means  lA(n)P  =  1. 
Since  perfect  reconstruction  is  achieved  by  choosing  A(n)  =  kl{-n),  we  also  have  Y,n  =  1- 

So  all  the  analysis  filters  have  the  same  energy. 


Appendix  C 


Since  Rxx  is  the  autocorrelation  matrix  obtained  from  a  scalar  VVSS  process  i(n),  it  is  positive 
semidefinite.  It  is  therefore  positive  definite  if  and  only  if  it  is  nonsingular.  If  this  matrix  is  singular, 
then  there  exists  v  51^  0  such  that  v^RxxV  =  0,  i.e.,  £'[|v^x(n)(-]  =  0,  i.e.,  v^x(ti)  =  0.  In  other 
words,  there  exists  an  FIR  filler  h'(r)=Uo  +  +  ...  +  such  that  the  output  in 

response  to  the  WSS  process  i(n)  is  zero.  Thus  if  S‘rj.(e-'“')  denotes  the  power  spectrum  of  i(n), 
then  the  power  spectrum  of  the  output  is  5ij.(e-'")|V'(e-’‘*')|'  =  0.  Since  the  FIR  filter  V{s)  can 
have  at  most  M  -  I  zeros  on  the  unit  circle,  this  means  that  the  power  spectrum  has  the  form 
5r3:(e''")  =  ~  i-e-.  is  3  harmonic  process.  Thus,  unless  x(n)  is  harmonic, 

Rxx  is  positive  definite.  This  is  a  well-known  fact  [35],  and  is  reviewed  here  only  for  completeness. 

Next  consider  Rgg  defined  in  (4.6).  Using  the  definition  of  g*'l(n)  in  (3.20)  we  see  that 

A/-1 


[Rgglp?  =  Yi  ~  ’  ”  P)9'{M  n-i-  f/)] 

1=0  n 


=  -  P)5’(f  -  (/)  =  Rgg{<}  -  P) 

e 


(C.l) 


where  Rgg{k)  is  the  deterministic  autocorrelation  of  the  sequence  g{n).  Thus  Rgg  is  a  determininslic 


autocorrelation  matrix  and  has  all  the  properties  of  Rxx-  R  can  be  written  as 

17(’0 

ff(n  -  1) 


^8g  -  Y 


g{n-M  +  \)} 


[!/‘(»0  <;■("-  1)  p'(n  -  :V/ -f  1)] 


(C.2) 


If  this  is  singular,  then  there  exists  a  vector  c  0  such  that  c^RggC  =  0.  Thus,  for  each  n  in  (C.2), 
we  must  have  c^gin)  -t-  c\g(n  -  1)  +  ,y(n  -  M  -i-  1)  =  0.  where  at  least  one  c,  is  nonzero. 
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Proceeding  as  in  tlie  previous  paragrapli.  we  see  that  tliis  happens  only  9(ti)  is  either  zero  or 
made  of  at  most  M  —  1  impulses, 
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Fig.  3.1.  The  quantizer  and  its  noise  model. 
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Fig.  3.3.  Magnitude  responses  of  ideal  brick>wall  analysis  filters. 
Synthesis  filters  for  perfect  reconstruction  have  the 
same  magnitude  responses. 


Fig.  3.4.  Magnitude  responses  of  ideal  analysis  filters,  for  a 
well-known  class  of  nonuniform  filter  banks. 
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Fig,  5.2.  Demonstration  of  the  coding  gains  of  paraunitary  convolvers. 
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Abstract 

The  use  of  an  adaptive  tree  structure  using  wavelet  packets  as 
a  generalized  wavelet  decomposition  for  signal  compression  was  re¬ 
cently  introduced  by  Coifman,  Meyer,  Quake,  and  Wickerhauser  (1). 
The  idea  is  to  decompose  a  discrete  signal  using  all  possible  wavelet 
packet  bases  of  a  given  wavelet  kernel,  and  then  to  find  the  “best” 
wavelet  packet  basis.  Unlike  the  work  in  [l],  in  this  paper,  we  employ 
a  framework  that  includes  both  rale  and  distortion.  A  fsist  algorithm 
is  formulated  to  “prune”  the  complete  tree,  signifying  the  entire  li¬ 
brary  of  admissible  wavelet  packet  bases,  into  that  best  basis  subtree 
which  minimizes  the  global  distortion  for  a  given  coding  bit  budget. 
Arbitrary  finite  quantizer  sets  are  assutneci  at  each  hierarchical  level  of 
the  basis-family  tree.  Finally,  a  DCT  “wavelet  packet”  basis  quadtree 
segmentation  is  described  as  an  image  riMling  application  in  a  JPEG 
environment,  with  good  improvement  .slit>\vn  over  non-adaptive  JPEG 
quantization. 
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1  Introduction 

Source  compression  for  stripping  redundancy  from  typical  highly  correlated 
sources  like  speech  and  image  waveforms  has  been  studied  extensive!}’.  Some 
popular  techniques  addressed  in  the  literature  include  vector  quantization 
(VQ),  linear  predictive  coding,  linear  transform  coding  (like  the  KLT  and 
the  DCT),  and  subband  coding  as  well  as  various  hybrid  combinations  of 
these. 

VQ  is  a  popular  and  powerful  scheme  for  compressing  correlated  discrete 
signal  sets  whose  characteristics  have  been  “trained”  initially,  but  its  com¬ 
plexity  grows  exponentially  with  vector  dimensionality.  Linear  transforma¬ 
tions  like  the  DCT  arc  less  computationally  demanding,  but  owing  to  their 
“fixed”  non-adaptive  nature,  their  compression  potential  relies  heavily  on  the 
stationarity  of  the  signal.  For  non-stationary  sources,  linear  transforms  or 
prediction  techniques  generally  fail  to  exploit  all  of  the  source  redundancy 
present.  If  one  could  combine  the  adaptability  of  VQ  with  the  speed  of  linear 
transform  coding,  one  could  achieve  a  coding  scheme  which  adapts  to  signal 
non-stationarities  without  sacrificing  computational  ease.  Wavelet  packets, 
introduced  by  Coifman,  Meyer,  Quake,  and  Wickerhauser  (CMQW)  [1,  2),  to 
be  described  in  section  2,  permit  exactly  this  combination,  and  offer  a  flexi¬ 
ble  yet  computationally  non-overwhelming  framework  in  which  to  undertake 
efficient  signal  compression. 

This  paper  is  organized  as  follows;  Section  2  provides  a  brief  description 
of  the  background  information  on  which  the  rest  of  the  paper  is  founded, 
while  also  outlining  the  scope  of  applicability  of  this  work  and  its  relation  to 
existing  literature.  Section  3  highlights  the  intuition  and  main  idea  of  the 
algorithm.  Section  A  states  the  problem  formally,  while  section  5  undertakes 
a  fast  solution  to  the  problem.  Section  6  flowcharts  the  complete  algorithm. 
Finally,  section  7  provides  an  image  coding  application  using  quadtree  seg¬ 
mentation,  based  on  our  fast  algorithm. 


2  Background  and  scope  of  this  work 

This  section  deals  with  a  brief  explanation  of  wavelet  packets,  a  summary  of 
bit  allocation  techniques  based  on  operational  rate-distortion  theory,  a  brief 
citation  of  existing  literature  and  the  contribution  of  this  paper,  as  well  as 
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tlie  scope  of  applicability  of  the  algorithm  described  here. 

2.1  Wavelet  packets 

Wavelet  packets  (\N’P)  were  introduced  recently  by  CMQW  as  a  family  of 
orthonormal  (ON)  bases  for  discrete  functions  of  R^,  and  include  the  well- 
known  wavelet  basis  and  the  Short- Time-Fourier-Transform-like  (STFT)  ba¬ 
sis  as  its  members.  While  a  brief  description  of  wavelet  packets,  together  with 
an  intuitive  feel  for  what  they  represent,  will  be  provi  d  here,  the  interested 
reader  is  referred  to  (l)  and  [2]  for  a  detailed  and  ma  .matical  treatment  of 
the  subject. 

Wavelet  packets  represent  a  generalization  of  the  method  of  multiresolu¬ 
tion  decomposition,  and  comprise  the  entire  family  of  subband  coded  (tree) 
decompositions,  from  which  the  optimal  decomposition  subtree  can  be  se¬ 
lected,  to  maximize  compression  by  permitting  the  signal  characteristics  to 
be  matched  “on  the  fly.”  Thus,  the  potential  of  the  C.MQW  wavelet  packet 
decomposition  scheme  lies  in  its  cap«.city  to  offer  a  rich  menu  of  ON  bases, 
from  which  the  “best  basis”  can  be  chosen.  If  one  represents  the  complete 
subband  decomposition  of  a  discrete  signal  set  in  as  a  regular  analysis 
tree  of  depth  log  N  (see  Figure  1),  the  CMQW  approach  permits  a  decom¬ 
position  topology  to  be  picked  corresponding  to  any  pruned  subtree  of  the 
original  tree,  i.e.  any  subtree  sharing  the  seime  root  as  the  original  tree.  This 
is  obviously  isomorphic  to  all  permissible  subband  topologies  (see  Figure  2), 
with  the  collection  of  terminal  nodes  (leaves)  of  every  pruned  subtree  repre¬ 
senting  the  entire  library  of  permissible  ON  bases  with  which  to  decompose 
the  original  signal. 

Thus,  this  decomposition  might  be  used  to  code  independent  segments 
of  a  given  non-stationary  signal.  It  enables  the  coder  to  exhibit,  for  ex¬ 
ample,  a  STFT-Iike  characteristic  (regular  tree)  at  one  source  instance,  a 
wavelet  characteristic  (logarithmic  tree)  at  another  instance,  or  any  interme¬ 
diate  characteristic  (arbitrary  WP  subtree)  at  yet  other  instances,  to  best 
match  the  signal’s  non-stationary  statistics.  See  Figure  2.  In  this  power¬ 
ful  scenario,  the  popular  wavelet  and  STFT  decompositions  are  mere  special 
casts  of  permissible  WP  structures.  To  emphasize,  the  CMQW  ON  decom¬ 
position  enables  each  internal  tree  node  (ON  basis  parent-member)  to  spawn 
off,  as  its  replacement,  branch  nodes  (ON  basis  child-n  embers)  that  provide 
a  complete,  disjoint  basis  cover  for  the  space  spanned  by  their  parent.  This 
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Figure  1:  Complete  wavelet  packet  tree  of  depth  log  N  to  code  signal  block  of 
dimension  N.  Each  node  nj  contains  the  basis  vector  6^  with  WP  coefficient 
vector  c'y  The  complete  set  of  all  pruned  subtrees  represents  the  library  of  all 
admissible  WP  bases,  or  equivalently,  all  subband  decomposition  topologies. 
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Figure  2:  (a)  All  possible  binary  wavelet  packet  decompositions  of  depth 
2.  (b)  Some  typical  depth-3  binary  wavelet  packet  subtree  decompositions. 
Note  that  Hi  and  Ho  represent  the  “low  pass”  and  “high  pass”  analysis 
filters. 
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properly  is  vital  to  the  development  of  the  fast  pruning  algorithm,  as  it  en¬ 
ables  the  coding  rate  and  distortion  corresponding  to  any  node  in  the  tree  to 
be  additive  over  the  rates  and  distortions  associated  with  the  branches  em¬ 
anating  from  that  node,  with  respect  to  an  h  norm  (like  m.s.e  or  weighted 
m.s.e)  criterion. 

2.2  Quantization  and  bit  allocation 

The  problem  of  bit  allocation,  where  a  given  bit  budget  must  be  distributed 
efficiently  among  a  set  of  given  admissible  quantization  choices,  is  a  classi¬ 
cal  problem  in  signal  compression  that  has  received  exhaustive  treatment  in 
source  coding  literature  [3,  4,  5,  6).  .A  classical  framework  for  source  cod¬ 
ing  is  Shannon’s  rate-distortion  theory,  which  deals  with  minimization  of 
source  distortion  subject  to  a  channel  rate  constraint,  or  the  dual  problem  of 
minimization  of  channel  rate  subject  to  a  distortion  constraint.  A  practical 
coding  environment  involves  a  finite  set  of  admissible  quantizers,  character¬ 
ized  by  their  (operational)  rate-distortion  functions,  ranging  from  convex 
[3]  to  completely  arbitrary  (4).  These  quantizers  are  used  by  the  allocation 
algorithm  to  determine  the  best  strategy  to  minimize  the  overall  coding  dis¬ 
tortion  subject  to  a  total  bitrate  budget  constraint.  We  use  this  framework 
to  seek  our  best  basis  WP  and  best  quantizer  choices. 

2.3  Related  work  and  contribution  of  this  paper 

While  the  adaptivity  and  the  speed  best-basis  search  of  [1]  are  unmistakable, 
the  cost  criterion  and  the  coding  (quantization)  method  used  there  to  exploit 
this  speed  and  flexibility  are  somewhat  ad  hoc.  In  this  paper  we  formulate 
a  fast  algorithm,  for  a  given  total  coding  bitrate  budget,  to  pick  the  optimal 
WP  basis,  together  with  the  optimal  quantizer  choice  for  that  optimal  WP 
subtree,  for  each  of  the  independent  segments  or  "blocks”  that  the  signal 
comprises.  Optimality  is  with  respect  to  a  global  distortion  criterion  that  is 
additive  over  the  signal  blocks,  e.g.  m.s.e  or  weighted  '  m.s.e.  We  conduct 
our  best  basis  hunt  in  a  rate-distortion  (R-D)  framework  that  considers  both 
aspects  (i.e.  rate  and  distortion)  of  the  coding  problem.  This  is  a  gener¬ 
alization  of  the  treatment  in  (1,  2]  where  a  one-sided  “entropy”  or  m.s.e. 

’In  image  processing  applications,  these  weights  may  be  designed,  for  example,  to  be 
commensurate  with  those  of  the  Human  t’isual  System. 
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distortion  criterion  is  used.  Our  approach  could  be  viewed,  in  its  quadtree 
application,  as  an  extension  of  the  work  by  Shoham  and  Gersho  [4]  to  provide 
a  fast  algorithm  covering  hierarchies  of  admissible  quantizers.  It  may  also 
be  regarded,  in  its  quadtree  segmentation  application,  as  a  generalization 
of  Chou  et.  al’s  (G-BFOS)  algorithm  [6]  to  the  case  where  monotonicity 
constraints  of  rate  and  distortion  with  tree  depth  are  removed.  Figure  3 
(a)  gives  an  example  of  a  rate-distortion  characteristic  that  is  constrained  to 
be  monotonic  with  tree  depth  as  identified  by  a  single  transition  from  the 
“merge”  to  “split"  boundaries,  a  constraint  that  is  necessary  for  the  quadtree 
algorithm  mentioned  in  [6].  Figure  3  (b)  shows  a  non-restrictive  case,  where 
arbitrary  transitions  between  the  “split"  and  “merge"  regions  are  permit¬ 
ted.  A  practical  contribution  of  this  paper  involves  the  description  of  the 
results  of  a  quadtree-based  image  compression  application  using  a  family  of 
DCT  “  wavelet  packet"  bases.  Our  application  is  similar  to  that  of  indepen¬ 
dently  done  work  by  Sullivan  and  Baker  [7],  who  performed  efficient  quadtree 
segmentation  using  VQ.  Our  example  uses  classified  quantizers  in  a  JPEG 
(DCT-based)  [S]  coding  environment. 

While  bit  allocation  strategies  for  various  coding  environments  have  been 
formulated  in  the  literature,  the  problem  of  using  arbitrary  quantizers  in  a 
generalized  muHiresolution  wavelet  decomposition  framework  has  not,  to  the 
best  of  the  authors’  knowledge,  been  addressed. 

2.4  Scope  of  applicability  of  our  algorithm 

It  must  be  emphasized  that  the  DCT-basis  family  tree  of  our  application  and 
the  standard  basis  tree  employed  by  [7]  are  not  strictly  WP  trees,  which  are 
derived  recursively  using  Quadrature  Mirror  Filters  (QMF)  filter  banks  or  us¬ 
ing  multi-resolution  wavelet  analysis  (whose  equivalence  has  been  established 
[12,  9]).  The  scope  of  applicability  of  our  algorithm  extends  to  all  classes  of 
structures  which  permit  the  construction  of  a  hierarchy  of  basis  covers  for  the 
input  signal  space.  While  this  obviously  includes  structures  like  quadtrees 
and  orthonormally  transformed  (e.g.  DCT)  quadtrees,  to  be  described  in 
detail  in  Section  7,  other  powerful  structures  such  as  the  CMQW  multires¬ 
olution  decomposition  wavelet  packets  and  hierarchical  subband  coders  are 
also  applicable.  As  an  example,  our  algorithm  could  be  used  to  determine 

*i.e.  as  the  tree  grows,  the  rate  increa.ses  and  the  distortion  decreases 
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(a)  (b) 

Figure  3:  “Split/Merge”  boundaries  shown  for  (a)  Monotonic  case  to  which 
the  G-BFOS  algorithm  is  constrained,  and  (b)  Non-restrictive  case. 
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quaniiialively,  sucli  important  coder  design  considerations  as  tl)e  optima]  de¬ 
composition  depth  for  subband  coding,  or  a  performance  comparison  of  filter 
hanks  of  different  kernels  and  topologies,  or  to  determine  an  efficient  DCT 
quadtree  structure  in  a  “hierarchicar  JPEG  application,  as  will  oe  explained 
in  Section  7. 

3  Basic  idea  of  the  algorithm 

We  convert  our  budget-constrained  search  for  the  best  wavelet  packet  basis 
(and  best  quantizer)  sequence  into  an  unconstrained  one  by  minimizing  the 
composite  Lagrangian  cost  functional  J(A)  =  D  -f  XR,  where  A’,  the  optimal 
Lagrange  multiplier  for  the  given  budget  constraint,  is  found  by  a  fast  iter¬ 
ative  scheme.  The  mathematical  details  and  formal  treatment  are  provided 
in  Sections  4,  5,  and  S.  The  translation  into  an  unconstrained  formulation 
makes  it  feasible  to  deal  with  each  signal  block  independently.  It  also  converts 
the  problem  into  a  fast  iterative  search  for  an  operating  point  on  the  convex 
hull  ^  of  the  composite  operational  rate-distortion  curve.  This  is  a  function 
of  the  input  signal  characteristics,  the  wavelet  kernel  picked  to  generate  the 
library  of  wavelet  packet  bases,  and  the  set  of  admissible  quantizers  for  each 
wavelet  packet  tree  level. 

As  will  be  shown,  at  optimality,  all  nodes  of  all  subtrees  representing  the 
sequence  of  signal  b'oeks,  must  operate  at  “constant  quality  slope  A".  See 
Figure  4.  For  a  given  A  =|  AD/AR  1,  we  populate  each  node  of  each  tree 
bloc^  independently  with  the  Lagrangian  cost  function  associated  with  the 
best  quantizer  for  that  node.  The  best  quantizer  for  a  particular  tree  node 
is  that  one  which  “li  at  absolute  slope  A  on  the  convex  hull  of  the  oper¬ 
ational  R-D  curve  for  that  node,  as  shown  in  Figure  4.  Then,  by  applying, 
n  parallel  for  each  signal  block,  the  pruning  criterion  of  Figure  4  recursively 
on  every  node,  starting  from  the  full-depth  tree  and  proceeding  towards  the 
root,  we  find  the  sequence  of  best  wavelet  packet  bases  and  associated  best 
quantizers  with  which  to  code  the  signal.  The  recursive  algorithm  exploits 
Bellman’s  optimality  principle  by  eliminating  quickly  a  host  of  suboptimal 
subtrees  from  contention  for  the  optimal  solution,  in  a  manner  reminiscent 
of  the  popular  Viterbi  algorithm,  and  is  similar  to  the  pruning  condition  of 
[1],  which,  however,  does  not  use  our  Lagrangian  cost  function.  Finally,  we 

^i.e.  Ihe  convex  boundary  of  R-D  points  (see  Fig.  3) 
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Figure  4:  Lagrangian  cost  pruning  criterion  for  “quality  criterion”  A  for  each 
parent  node  of  the  wavelet  packet  tree.  This  condition  is  used  recursively  to 
do  fast  pruning  from  the  complete  tree  depth  towards  the  root  to  find  the 
optimal  subtree  for  a  given  A. 
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show  a  fast  way  of  iterating  over  the  Lagrangian  multiplier  A,  in  a  convex 
search  using  Newton’s  method  or  bisection  methods  [10],  to  find  the  optimal 
A*  satisfying  the  given  budget  constraint. 


4  Formal  Problem  Definition 

Without  loss  of  generality,  we  will  consider  the  problem  of  a  binary  wavelet 
packet  decomposition  tree  of  a  discrete  input  signal  (vector)  of  size  N  in 
/*(A^)  (si,S2, . . .  ,sa;).  See  Figure  1.  Though  omitted  for  convenience,  each 
branch  of  the  analysis  tree  consists  of  the  appropriate  filter:  high-pass  filter 
(HPF)  Ho  for  the  upper  child  and  low-pass  filler  (LPF)  //]  for  the  lower 
child,  followed  by  a  decimator  by  2  (see  Figure  2),  with  the  correspond¬ 
ing  synthesis  tree  consisting  of  an  upsampler  followed  by  the  corresponding 
synthesis  filters. 

The  analysis  and  synthesis  filters  of  each  branch  satisfy  the  standard 
orthonormality  conditions  of  paraunitary  perfect  reconstruction  filler  banks 
(PRFB’s  (9)).  As  is  well  known,  iterating  the  orlhonormal  filter  tempi;/ es  to 
the  complete  tree  depth  results  in  an  equivalent  generalized  multiresolution 
decomposition  tree  (i.e.  wavelet  packet  tree)  whose  nodes  represent  a  family 
of  orthonormal  bases  [1,  2).  As  shown  in  Figure  1,  we  assume  that  there  are 
M  signal  blocks  to  be  coded  independently,  each  of  size  N.  To  help  provide 
a  clear  notation-free  understanding  of  our  algorithm,  we  introduce  a  “toy” 
example  that  we  will  invoke  at  various  points  in  this  paper.  The  example, 
shown  in  Figure  5,  is  that  of  coding  a  iength-4  signal  block,  using  the  often- 
cited  Haar  basis  (or  sum  and  difference  filters,  using  filter-bank  jargon)  as 
the  wavelet  packet  kernel.  Figure  2(a)  shows  the  possible  decompositions  for 
the  given  signal. 

Let  us  define  the  following  terms  to  be  used  in  the  formulation: 

•  A/  :  number  of  independently  coded  signal  blocks. 

•  N  :  number  of  elements  in  each  signal  block  (assumed  to  be  one¬ 
dimensional,  without  loss  of  generality). 

•  T  :  complete  wavelet  packet  tree  (STFT  tree),  for  each  signal  block,  of 
depth  log  N  .is  shown  in  Fig.  1. 

•  i  :  any  node  of  T,  inte  al  or  terminal. 
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HqCz)  |— {2?)-  (c^) 

,1 

H,(z)  —(2})-  (c^  ) 


{I^)=1N2  "J  J.J  J  ;{t4)=lW2 


{bj)=l/2[l.l.ll3  ;  {t^}=  1/2  y -1  1  I 

{b^)=l/2p  1-113  ;  {bj)=l/2ll  1  1  13 


(c*  }  represents  the  inner  product  of  signal  w.r.t.  basis  vector(s)  (b*) 
*1  j .  J 

associated  with  node  f  n  J 
J 


Figure  5;  Toy  example  showing  wavelet  packd,  decomposition  for  a  length-4 
signal  block  using  the  popular  Haar  wavelet  kernel. 
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•  S  ■:<  T  :  pruned  subtree  of  T*,  i.e.  a  (wavelet  packet  basis)  subtree  of 
T  that  shares  its  root;  thus,  S  corresponds  to  any  admissible  wavelet 
packet  basis. 

•  5  :  set  of  leaves  or  terminal  nodes  of  subtree  S. 

•  qii(t)  :  set  of  all  admissible  quantizers  for  node  t  £  T.  The  toy  example 
at  the  end  of  this  section  presents  some  admissible  quantizer  choices 
for  a  particular  case. 

•  Qa(S)  :  vector  set  of  all  admissible  quantizers  for  the  collection  of 
individual  leaf  nodes  of  subtree  S  ={q«{ti)  x  q«(t2)  x  ...  x  q«(tL)}, 
where  (tj,  tji  •  •  •  ^l]  €  5,  is  the  complete  set  of  leaves  or  terminal  nodes 
of  S. 

•  ;  (see  Figure  1)  the  jth  (of  the  possible  2'  choices)  node,  basis, 
and  coefficient  vector  respectively,  at  the  rth  tree-depth  or  "scale”  (for 
i  —  1,2, . . . ,  log  A^).  Note  that  6)  represents  the  R^-basis  members 
associated  with  node  n^,  while  c)  represents  the  inner  product  of  the 
signal  with  the  basis  vectors  in  6).  See  Figure  5  for  a  Haar  basis  kernel, 
where  the  length-4  signal  is  broken  down  into  all  possible  wavelet  packet 
coefficients  for  the  complete  depth-2  binary  tree.  Note  also  that  to 
simplify  notation,  {t,  bi  and  Cj}  will  be  invoked  where  convenient. 

•  D^{i),  i2,(t)  ;  distortion  and  bitrate,  respectively,  associated  with  quan¬ 
tizing  wavelet  packet  coefficient  vector  Ct  of  node  t  using  quantizer 
9  €  q«(t). 

•  Dq{S),  FLq{S)  :  distortion  and  rate,  respectively,  associated  with  cod¬ 

ing  subtree  (or  wavelet  packet)  S  using  quantizer  Q  E  Qr(S).  In 
our  case,  they  are  both  linear  tree  functionals;  i.e.:  Total  distortion 
=  Dq{S)  =  and  total  rate  =  Rq[S)  =  E,es^9(0- 

The  problem  to  solve,  then,  is  that  of  finding,  given  a  total  budget  of 
Rbudffet  t'O  code  M  independent  signal  blocks,  that  sequence  of  (pruned)  sub¬ 
tree  best-bases  5’  T  (for  :  =  1,2,...,^/)  together  with  their  associated 
optimal  quantizers  Q’  E  Q«(S* )  which  minimize  the  global  coding  distortion, 
ated  mathematically,  this  boils  down  to  determining  Dmin  = 
iiere 
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z)o;(s;) 


min 


min 

[«.€Q&(S.) 


Dq,{S,) 


M 

such  that  Rtoiai  =  52-^;  (•?,’)  ^  R-hvdget, 

«=i 

where  Rbudget  is  the  given  bit  budget  constraint. 


(1) 

(2) 


Toy  Example 

As  an  example,  suppose  we  want  to  find  the  best  wavelet  packet  basis  corre¬ 
sponding  to  the  Haar  wavelet  kernel  for  an  input  signal  s  =  [109, 23,  — 9S,  13], 
with  one  block  and  dimension  4,  i.e.  N=4,  M  =  l,  for  a  coding  budget  of  21 
bits,  for  the  following  classes  of  admissible  quantizers  for  each  tree  level;  Sup¬ 
pose  we  have  three  grades  of  uniform  quantizers  (coarse,  medium,  and  fine) 
having  step  sizes  of  16,  4,  and  1  resp.,  or  equivalently,  a  granularity  of  16, 
64,  and  256  levels  (using  4,6,  and  8  bits)  respectively,  assuming  a  quantizer 
dynamic  range  from  -128  to  -f  128.  As  shown  in  Figure  5,  for  convenience, 
the  tree  scales  are  denoted  by  the  labels  A,  B,  and  C.  At  full  tree-depth  C, 
the  quantizers  1,2,  and  3  denote  the  fine,  medium,  and  coarse  scalar  quan¬ 
tizers  for  each  of  the  4  wavelet  packet  coefficients  Cl,  C2,  C3,  and  C4.  At 
depth  B,  the  quantizers  1,2,  and  3  denote  the  lfine,fine].  Imedium,medium], 
and  [coarse, coarse]  combination  of  quantizers  applied  independently  to  each 
of  the  wavelet  packet  coefficients  Bl  and  B2.  At  the  tree  root  A,  similar 
vectors  of  the  three  differenl  grades  of  scalar  quantizers  are  available  to  code 
the  4-D  coefficient.  Assume  a  m.s.e  distortion  criterion. 

Note  that  the  wavelet  packet  coefficients  are  the  inner  products  of  the 
input  signal  with  the  respective  basis  vectors  (see  Figure  5); 

c°  =  [109,23,-98,13] 

cj  =  [-60.81,78.49]  ;  <4  =  [93.34,-60.1] 

c]  =  [98.5];  c|  =  [12.5];  c|  =  [-108.5];  c’  =  [23.5] 

Figure  6  shows  the  rate-distortion  curves  for  all  possible  basis  subtrees 
in  our  example,  for  the  permissible  quantization  choices.  Thus,  for  example, 
c°  =  [109,23,-98,13]  would  be  quantized  to  [108,24,-96,12]  (for  a  total 


14 


258  AFIT/AFOSR  Wavelets  Workshop 


squared-error  distortion  of  7.0)  with  tl)e  medium  grade  (step-size  4)  quan¬ 
tizer,  and  so  on. 


5  Fast  solution 

We  solve  the  constrained  problem  of  Equation  (1)  by  converting  it  to  an 
unconstrained  problem  using  Lagrange  multipliers.  This  section  spells  out 
the  unconstrained  approach,  and  explains  how  our  problem  is  a  hierarchical 
extension  of  iliat  presented  in  (4).  A  fast  pruning  algorithm  is  used  to  remove 
suboptimal  subtrees  that  would  not  otherwise  have  been  eliminated  if  we  had 
resorted  to  a  “flattened”  version  of  our  problem  to  emulate  that  solved  in 
[4j.  Solving  the  unconstrained  problem  for  different  positive  values  of  the 
Lagrange  multiplier  results  in  the  tracing  out  of  convex  hull  points  of  the 
rate-distortion  curve.  The  optimal  convex  hull  point  we  solicit  is  that  with 
the  minimum  distortion  while  not  exceeding  the  given  rale  budget. 


5.1  Unconstrained  Optimization  Approach 

Instead  of  solving  the  constrained  optimization  problem  (1),  let  us  consider 
the  following  unconstrained  formulation.  Let  us  introduce  the  Lagrangian 
cost  functional  corresponding  to  the  Lagrange  multiplier  A  >  0,  for  each 
signal  block  i,  of  basis  subtree  5,  T  and  subtree  quantizer  set  Q,  £  Qa(Si), 


^.(A)  =  JxiS.,Q.)) 
=  Dg.[S.)  + 

«es. 


(3) 

(4) 

(5) 


where  the  last  equation  is  written  in  terms  of  the  leaf  nodes  of  the  subtree, 
as  a  result  of  the  subtree  rate  and  distortion  functions  being  additive  over 
its  leaf  nodes. 

We  now  develop,  by  a  simple  extension  of  Theorem  1  in  [4]  to  include  the 
ensemble  of  wavelet  packet  bases  S  T  as  well  as  their  associated  quantizers 
Q(5)  €  Qa(S),  an  equivalent  unconstrained  problem.  This  formulation  is  at¬ 
tractive  because  it  decomposes  the  original  problem  into  independent  parallel 
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optimizations  for  each  signal  block  t  =  1,2,...,  A/.  Mathematically  stated, 
Jor  a  fixed  value  of  X,  the  unconstrained  problem  specified  below  is  solved  for 
5,,  Qi  for  :  =  1,2, . . . ,  A/  (for  the  “correct"  fi.ved  value  of  A  ,  which  is  a  func¬ 
tion  of  the  given  budget  constraint,  and  the  hunt  for  which  will  be  described 
later,  we  solve  the  original  problem  to  within  a  convex  hull  approximation); 


j;(A)  =  Jx{s:,q:) 

(6) 

=  min 
S,-<T 

min 

Q.eQats,) 

(7) 

=  min 
5.:<7' 

min 

<5.€Qa(S.) 

10o.(5.)  +  Af?o,(S,)l 

(S) 

=  min 
5.^7' 

min  [£),(<)  + A/?, (<)). 

(9) 

In  order  to  make  notation  less  cumbersome,  if  we  consider  the  problem  of  a 
single  wavlet  packet  tree  (i.e.  A/  =  1),  with  extention  to  A/  >  1  being  trivial, 
due  to  the  parallelization  of  the  problem,  the  problem  becomes  determining: 

Dq>{S’)  =  min  minZ?Q(5')  s.t,  Rq-{S')  <  Rbudstt-  (10) 

S  Q 

Thus,  the  unconstrained  problem  of  Eq.  (7)  becomes  finding: 

r(A)  =  A(if.<?-)  =  nijn  rn^in  |Bo(5)  +  A/l,(5)l.  (11) 

v€i^a(S) 

The  above  approach  identifies,  for  a  fixed  positive  A,  an  optimal  operating 
point  on  the  convex  hull  of  the  composite  rate-distc  ion  curve  for  the  spec¬ 
ified  problem.  If  the  original  constrained  problem  ;  -pened  to  have  a  bud¬ 
get  constraint  that  “hit"  one  of  the  convex-hull  operating  points,  then,  the 
unconstrained  and  the  constrained  problems  have  identical  solutions.  Math¬ 
ematically  stated,  the  equivalence  is  established  in  the  following  theorem,  a 
direct  hierarchical  extension  of  Theorem  1  of  Shoham  and  Gersho  [4]: 

Theorem  1  solution  to  the  unconstrained  problem  of  Eg. 

(11)  corresponding  to  some  fixed  value  of  X,  then  it  is  also  the  solution  to  the 
constrained  problem  of  Eg.  (10)  for  the  particular  case  of  Rbudset  = 
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Proof: 


< 

JxiS,Q) 

(12) 

dq:{s:)  +  >.rq;,{s:) 

< 

Dq{S) XRQiS) 

(13) 

Dq:{S:)  ~  Dq{S) 

< 

X\Rq{S)  -  Rq:{S:)] 

(14) 

Dq:{S:)  ~  Dq{S) 

< 

A[/2q(5)  — 

(15) 

Since  Equation  (15)  holds  for  all  S  T  and  Q  €  Qa(S),  it  certainly  holds 
for  the  subsets  ^  7,  Q  €  Qa(S)  which  satisfy  Rq{S)  <  Riudgti-  That  is, 

Rq{S)  <  Rhudget  for  5  6  5,  Q  6  Q.  (16) 

Thus,  from  Eq.  (15)  and  Eq.  (16),  since  A  >  0,  we  have; 

Dq:{S:)~DqAS.)<0  vseS^QeQ,  (17) 

i.e.(S’*,  (?* )  also  satisfies  the  original  constrained  optimization  problem  of 
Eq.  (10)  for  the  given  budget  constraint.  □ 

Note  again  that  the  implication  of  the  above  result,  when  extended  to  the 
case  of  arbitrary  M,  is  that  if  we  solve  the  unconstrained  problem  of  Eq.  (8) 
for  some  A  >  0,  and  if  Rt,udget  of  the  constrained  problem  of  Eq.  (1)  happens 
to  be  R<i^{^i)  of  the  unconstrained  problem,  then  the  solutions  to  both 
problems  are  identical.  The  unconstrained  problem  lends  itself  to  a  much 
easier  solution  than  the  constrained  one,  though  of  course,  the  latter  may 
have  an  “inaccessible”  solution,  since  the  unconstrained  approach  maps  out 
only  convex  hull  points  of  the  R-D  curve.  Figure  7  shows  an  example  of  an 
inaccessible  convex-hull  solution,  where  the  budget  constraint  line  does  not 
pass  through  a  convex-hull  point.  The  excess  R^udgei  —  Yl!!L\  RQ'iS')  bits, 
which,  in  practice,  represent  a  negligible  fraction  of  the  original  budget,  can 
be  allocated  using  “greedy”  heuristics,  or  using  a  steepest  descent  algorithm 
similaT"  to  that  of  [4],  Bounds  on  the  suboptimality  of  the  unconstrained  solu¬ 
tion  can  be  found  in  17).  The  power  of  the  unconstrained  approach  obviously 
lies  in  its  “parallelization”  of  the  original  problem  into  smaller  independent 
optimization  problems. 
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DISTORTION 


RATE 


Figure  7:  Composite  R-D  curve  showing  convex  hull  solution  to  an  “inac¬ 
cessible”  problem. 
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5.2  “Flattening”  the  problem 

If  we  define  A',  for  each  independent  block  i  =  1,2,...,;^/  to  be  the  set 
of  all  admissible  operating  points  for  block  i  (i.e.  the  set  comprising  all 
combinations  of  subtrees  and  their  associated  quantizers,  {5,,  Q,}  VS,  :< 
T,Qi  €  QaCSi)),  then  we  have  a  “flattened”  version  of  our  problem,  similar 
to  that  solved  in  [4).  i.e.  the  constrained  problem  becomes: 

m\n^D{x^)  subject  to  (18) 

r,€A.  ^  ^ 

Though  the  “flattened”  version  of  our  problem  can  be  made  uncon- 
strained  and  solved  as  in  [4],  the  hierarchical  nature  of  our  wavelet  packet 
tree-structured  problem  lends  itself  to  a  fast  algorithm.  Thus,  it  would  be 
computationally  wasteful  (for  a  tree  of  depth  n,  there  are  0(2")  subtrees!) 
to  solve  our  problem  by  the  exhaustive  search  method,  which  is  what  “flat- 
tening”  of  our  problem  would  entail. 

However,  the  flattened  version  does  have  some  merits,  e.g.  it  can  be 
invoked  to  simplify  notation,  and  it  can  serve  as  a  base  from  which  to  inherit 
some  important  unchanged  features  of  the  unconstrained  problem.  Some  of 
the  key  results  inherited  from  [4],  as  they  apply  to  our  problem,  are  hence 
presented  as  a  summary: 

•  The  convex-hull  optimally  allocated  rate  and  distortion  values  for  each 
block,  for  a  given  A,  are  monotonic  non-increasing  and  non-decreasing 
step  functions,  respectively,  of  A. 

•  The  monotonic  step  function  nature  of  /?*(A)  and  D'{X)  is  caused  due 
to  the  discrete  nature  of  the  problem.  Thus  A  could  be  interpreted  as 
an  index  of  operating  quality  as  it  varies  from  0  — *  cc. 

•  As  A  is  swept  through  all  positive  real  numbers,  all  the  convex  hull 
points  of  the  composite  R-D  curve  are  traced  out.  See  Figure  7  for  a 
typical  composite  R-D  curve. 

With  ./.(A)  =  D{xi)  -f  XR(x,)  representing  the  customary  Lagrangian 
subcost  for  block  j  associated  with  operating  point  i,  for  quality  criterion  A, 
let  us  introduce  the  biased  Lagrangian  cost  IV  as: 

H'(A)  =  H'(A,{i-(A))«,)  (19) 
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=  H/(A,x'(A))  (20) 

=  (y^  Jj  (‘^))  ~  ^f^budget  (21) 

lal 

=  ((J3 m>n(J9(x,)  +  A/?(i,)])  -  XRbudgei]-  (22) 

I 

Note  that  x  above  refers  to  a  vector  notation  of  the  sequence  ij,  ij, . . . ,  x^. 
Then,  following  the  optimization  theory  outlined  in  [13],  we  have  the  follow¬ 
ing  result; 

Lemma  1  M''(A)  is  a  concave  F)  function  of  A, 

Proof:  See  Appendix. 

Now,  if  we  find  the  maximum  of  H''(A)  over  all  positive  A, 

ly(A-)  =  H^(A*,x-(A*))  =  max  iy(A),  (23) 

w'e  have  the  following  result  for  the  unconstrained  solution  corresponding 
to  the  given  budget  constraint  Rbudget- 

Theorem  2  A*  and  x*(A*)  that  maximize  W  in  Eq.  (23)  are  the  optimal 
convex  hull  face  slope  and  optimal  convex  hull  operating  point,  respectively, 
for  the  unconstrained  optimization  problem  of  Eq.  (8),  for  the  given  budget 
constraint  Rbudget- 

Proof:  See  Appendix. 

Thus,  the  above  result  gives  the  condition  on  the  desired  operating  quality 
slope  which  solves  the  flattened  version  of  our  original  problem.  By  “unflat¬ 
tening”  the  above  result,  we  now  develop  the  unconstrained  solution  to  our 
best  wavelet  packet  basis  and  optimal  quantization  choice  problem.  The 
optima]  slope  A*  is  the  solution  to  : 

]y(A*)  =  W/(A-,x-(A*))  (24) 

ht 

=  (IT  nnin  Ja(5,,Q,)]  -  X/hudset)  (25) 

Vi 

=  (E  +  ^^9(0]}]  -  ^Rhudsei).  (26) 

'=1  '  fci. 
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This  is  then  the  unconstrained  optimization  formulation  to  our  problem, 
which  can  be  dissected  into  independent  fast  individual  optimizations.  Thus, 
Eq.  (26)  can  be  dissected  into  the  following  3  optimizations; 

•  At  first  (innermost  minimization),  the  best  quantizer  choice  for  every 
terminal  node  of  a  fixed  subtree  S  is  found  (where  the  subtree  cost 
is  assumed  to  be  additive  over  that  of  its  terminal  nodes)  /or  a  fixed 
operating  slope  A,  independently  for  every  block. 

•  Then  (outer  minimization),  the  best  basis  subtree  is  determined  for 
each  block  independently,  again  for  the  fixed  operating  slope  A,  from 
amongst  all  permissible  wavelet  packet  bases  decompositions  for  the 
given  wavelet  kernel.  A  fast  dynamic  programming  based  pruning  op¬ 
eration  will  be  done  to  accoinplish  this  (see  Section  5.4), 

•  Finally  (outermost  maximization),  the  optimal  slope  A’  that  meets  the 
given  budget  criterion  Rbudget  »s  determined  as  the  maximum  of  the 
iy(A).  Lemma  1  facilitates  the  use  of  fast  search  methods  for  finding 
the  optimal  A’  in  an  iterative  fashion  (see  Section  6.2). 


5.3  Geometric  interpretation 

One  insight  to  be  made  into  the  unconstrained  optimization  problem  is  that 
of  a  geometric  approach.  It  can  be  shown  that  the  optimal  operating  point 
on  the  R-D  plane  for  each  leaf  node  of  the  tree  T  for  a  given  slope  A  is 
that  point  in  tlie  collection  of  R-D  points  which  is  first  “impinged  upon” 
by  a  “plane-wave”  of  slope  —A  emanating  from  the  fourth  quadrant  of  the 
R-D  plane  towards  the  R-D  curve  in  the  first  quadrant.  This  is  because  the 
Lagrangian  cost  J  associated  with  any  admissible  operating  point  can  be  in¬ 
terpreted  as  the  y-intercept  of  the  straight  line  of  slope  -A  passing  through 
that  point  on  the  operational  rate-distortion  plane.  See  Figure  S.  The  min¬ 
imum  Lagrangian  function  (minimum  y-intercept)  is  obviously  achieved  for 
that  point  which  is  “hit”  first  by  the  plane  wave  of  absolute  slope  A  imping¬ 
ing  on  the  rate-distortion  curve.  Note  also  from  Figure  8  that  the  biased 
Lagrangian  function  1F(A)  can  be  interpreted  as  the  intercept,  on  the  budget 
constraint  line,  of  the  straight  line  of  slope  -A  tangent  to  the  convex  hull  of 
the  R-D  curve.  This  geometric  interpretation  of  the  problem  makes  a  lot  of 
the  properties  itemized  earlier,  like  monotonicity  and  existence  of  singular 
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DISTORTION 


Figure  8;  Geometric  interpretation  of  the  problem. 
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slope  values,  as  well  as  the  concavity  of  IF,  and  the  solution  to  the  optimal 
convex-hull  operating  point,  both  intuitively  pleasing  and  easy  to  show  using 
straight-line  geometry,  though  a  more  rigorous  algebraic  proof  is  provided  in 
the  appendix. 

5.4  Finding  the  best  basis  subtree  for  each  block 

The  difference  between  the  flattened  approach  and  the  hierarchical  approach 
is  in  the  search  for  the  best  basis  for  each  block  of  the  signal.  While  a  flattened 
approach  entails  an  exhaustive  search  of  the  entire  family  of  wavelet  packets 
in  a  “brute  force”  manner,  the  hierarchical  approach  uses  a  fast  “pruning” 
algorithm  to  determine  the  best  basis. 

\Vhile  motivated  by  the  “entropy”  pruning  criterion  mentioned  in  [1], 
our  formulation  is  in  a  dynamic  programming  framework  using  a  Viterbi-like 
algorithm.  Besides,  it  must  be  emphasized  that  the  CMQW  cost  criterion 
used  to  populate  the  tree  nodes  prior  to  pruning  uses  a  one-sided  function 
(either  rate  or  distortion),  whereas  we  resort  to  a  Lagrangian  cost  function, 
v'hich  is  optimal  in  a  rate-distortion  sense. 

A  Viterbi-like  fast  dynamic  programming  technique  is  feasible  due  to  the 
ON  property  of  the  WP  basis  family,  that  enables  the  signal  space  spanned  by 
an  arbitrary  subtree  rooted  at  internal  node  t  of  the  tree  to  be  identical  to  the 
space  spanned  by  the  twin  subtrees  rooted  at  the  two  branches  emanating 
from  node  i.  To  be  specific,  let  t  =  n),  i.e.  t  is  the  ;th  node  of  the  ith 
hierarchical  level  (or  depth)  of  the  tree  T.  Its  two  children  are  tj  = 
and  <2  =  ■  See  Figure  1.  Then,  because  of  the  ON  property,  the  subtrees 

rooted  at  tj  and  i-i  cover  disjoint  halves  of  the  signal  space  spanned 

by  their  parent  node  i. 

This  allows  a  direct  qvontitativc  one-to-one  comparison  between  the  N/2' 
basis  coefficients  {cj}  associated  with  the  basis  subset  {6j}  of  node  i  with 
tl)e  (2  X  (A7(2*'‘^'*)))  coefficients  { {cj*!, ) ,  {c'j^’ ) }  associated  with  the  basis 
subsets  and  {b^^}  of  nodes  fj  and  <2  respectively.  The  “split/mcrge” 

decision  will  be  based  on  which  option  leads  to  a  cheaper  Lagrangian  cost, 
as  spelled  out  in  Figure  A. 

Assume  known  the  optimal  subtree  from  node  i  =  nj  “onwards”  from 
node  i  “onwards”  to  the  full  tree-depth  log  N.  We  could  liken  the  subtrees  to 
surviving  paths  in  the  Viterbi  algorithm  [N).  Thc^,  by  Bellman’s  optimality 
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principle  [15],  we  know  all  surviving  paths  passing  through  node  i  =  n'j 
at  depth  t  must  invoke  this  sajne  optimal  “finishing”  path.  There  are  two 
contenders  for  the  “surviving  path”  at  every  node  of  the  tree,  the  parent  and 
its  children,  with  the  winner  having  the  lower  Lagrangian  cost. 

Using  this,  we  begin  at  the  complete  tree-depth  n  =  log  N  and  w’ork  our 
way  towards  the  root  to  of  the  tree,  using  the  above  cost  criterion  at  each 
level  i  to  determine  whether  to  split  or  merge.  This  decision  (or  “path”)  is 
remembered  and  used  to  determine  the  best  path  when  applying  the  same 
pruning  criterion  on  the  branches,  which  process  is  repeated  till  the  root  is 
encountered.  At  this  point,  the  entire  best  path  or  best  wavelet  packet  basis 
is  known. 


6  Complete  Algorithm 

The  stage  is  now  set  to  integrate  the  results  of  the  previous  two  sections  to 
formulate  the  optimal  algorithm.  This  will  be  done  in  two  phtises.  First,  the 
optimal  algorithm  for  a  given  operating  slope  A  will  be  flowcharted,  followed 
by  a  description  of  the  hunt  for  the  optimal  operating  slope  A’.  Note  that 
the  algorithm  is  applied  independently  on  each  signal  block  to  determine  the 
best  wavelet  packet  basis  corresponding  to  that  subblock. 

6.1  Initialization 

Prior  to  the  actual  pruning  operation,  a  one-time  fi.ved  cost  of  gathering  the 
statistics  enlisted  in  Steps  1  and  2  below  must  be  endured.  Associated  with 
every  node  n)  of  T  is  a  data  structure  of  the  form;  D'^,  J-,  split{n'^)] . 
The  first  th  ree  members  refer  to  the  rate,  distortion,  and  Lagrangian  cost  as¬ 
sociated  with  the  optimal  (for  the  given  A)  subtree  from  n)  onwards,  i.e.  the 
optimal  subtree  rooted  at  n),  while  the  last  member  of  the  data  structure, 
split{n'j),  is  a  binary  variable  whose  meaning  {yes  or  no)  reflects  the  deci¬ 
sion  of  whether  or  not  it  is  optimal  to  split  the  node  into  its  children  branches. 

Step  1;  Generate  the  coefficients  (c)}  for  the  entire  WP  family. 

Step  2:  Gather  the  given  quantizer  set  dependent  {Rq{t),  D^{i))  values  for 
all  the  nodes  t  ^  T  £  qa(t),  to  generate  the  R  vs  D  points  for  each  node. 
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Phase  I:  Optimality  For  A  Given  Operating  Slope 

Phase  I  of  the  algorithm  is  run  for  a  given  slope  value  A,  and  could  be  con¬ 
sidered  a  subroutine  called  by  the  Phase  II,  described  later  in  the  section, 
for  the  fixed  budget  allocation  problem; 

Step  3:  For  the  A  of  the  current  iteration,  populate  all  the  nodes  t  of 
the  tree  with  the  minimum  Lagrangian  cost  function  associated  with  that 
node  (J«(A),  or  equivalently,  Jj(A)  when  referring  to  the  j  th  node  at  scale 
i)  where  JjfA)  =  min<„[Z)„(<)  +  Ai?„(<)) 

Step  4:  Initialize  i  n,  where  n  =  log  A'  is  the  maximum  signal  block 
tree-dc])th.  For  t  =  r;",  if  q'  is  the  value  of  qt  that  minimizes  J,(A)  initialize; 
R]  ^  (where  7?"  =  R,.{i)) 
i);  -  (where  Z);  =  D[;{t))) 

J;  -  j; 

Step  5:  z  ♦—  I  —  1.  If  2  <  0,  go  to  Step  8. 

Step  6:  Vj  =  1,2, . . .  ,2'  at  the  zth  tree  level; 

then  {  spliUn'j)  —  A’O;  A)  —  Hj;  —  Dy,  j]  —  Jj  ) 
else  (  sptitiny)  ^  YES'.  Ft)  -  /!gi.  +  K,*'-. 

Step  7:  Go  to  Step  5. 

Step  8;  Starting  from  the  root  Iq,  and  using,  in  a  linked-list  fashion,  the 
node  data-structure  element  sp/z<(node),  selected  optimally  for  all  the  nodes 
of  r,  carve  out  the  optimal  subtiee  S'{\)  and  its  associated  optimal  quan¬ 
tizer  choice  Q'(A)  €  Q«(5’(A)).  Also  readily  available  at  the  data-structure 
for  root  node  to  are  Rq’[S’)  =  ^  and  Dq.[S')  =  i>°,  the  rate  and  distortion 
of  the  optimal  subtree  S‘{X). 


A  point  to  be  made  is  that  it  is  possible  to  directly  incorporate  into 
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the  pruning  algoritlim  the  cost  of  segmentation  (in  terms  of  overhead  bits 
of  the  subtree  map  to  be  sent),  if  an  a  priori  map-representation  scheme 
is  available.  For  example,  if  the  subtree  structure  costs  one  bit  per  merge 
decision,  this  bit  could  be  included  in  the  Lagrangian  cost  comparison  of 
the  children  nodes  with  the  parent  node  in  Step  6  of  the  Phase  I  algorithm 
outlined  above.  However,  in  our  generalized  algorithm  such  subtleties  are 
not  Micluded  (in  practice,  they  make  negligible  difference  anyway)  as  no  a 
priori  map-coding  scheme  is  assumed.  It  is  assumed  that  the  Rhudget  criterion 
given  for  the  problem  is  for  pure  coding  expenditure  without  any  overhead 
expenses,  which  may  be  minimized  using  entropy  coding  of  the  tree-map 
if  necessary,  or  by  coding  only  the  locations  of  the  non-zero  coefficients,  if 
that  is  cheaper.  In  our  application  to  be  described  later,  we  found  that  the 
overhead  represented  a  negligible  proportion  of  the  total  budget. 

6.2  Finding  The  Optimal  Operating  Slope 

The  problem  of  picking  the  optimal  slope  value  for  a  given  budget  criterion 
Rbudgei  "'ill  be  the  subject  of  discussion  in  Phase  11  of  the  algorithm;  the  iter¬ 
ative  invocations  of  the  Phase  I  subroutine  in  search  of  the  optimal  operating 
slope  App,,  which  satisfies  the  given  budget  constraint,  will  be  described  in 
this  section. 

As  was  shown  in  section  5,  due  to  the  concavity  of  IF(A)  in  A  (see  Figure 
9),  and  since  our  optimal  operating  slope  A'  is  max”’ (IV),  we  can  find  our 
optimal  operating  point  using  a  fast  convex  search  algorithm  like  Ne.vton’s 
method  or  bisection  methods  (10).  Equivalently  stated,  we  are  interested  in 
the  zero-crossing  operating  slope  of  the  derivative  of  If',  dWjdX.  Recall  that: 

min[£>(x,)  +  A/?(t.)])  -  XRbudget 

•A  I 

I 

which  implies  that,  at  non-singular  values  of  A, 

dW/dX  =  ^R’{X)-Jh,,,,,  (27) 

i 

where  R'{X)  is  the  rate  associated  with  the  optimal  subtree/quantizer  choice 
for  block  J.  Due  to  the  discrete  nature  of  our  problem,  A  is  singular  at  only 
a  finite  numbe:  of  points  (see  Figure  9).  Also,  as  was  developed  in  Theorem 
2  (see  appendix),  the  optimal  slope  A*  which  maximizes  IV  corresponds  to 
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a  singular  value.  From  Equation  (27),  at  non-singular  values  of  A  <  A*,  we 
have  E,  >  ^bvdget,  while  for  non-singular  values  of  A  >  A*,  £,  R‘{X)  < 

Rbudgci-  This  then  leads  to  the  iterative  fast  convex  search  algorithm  to  be 
described. 

As  with  most  iterative  solutions,  the  choice  of  a  good  initial  operating 
point  is  the  key  to  a  fast  convergence.  Picking  the  initial  A  could  be  a 
research  topic  all  unto  itself.  Some  heuristics  are  mentioned  in  [4].  In  a 
predictive  image  coding  environment,  for  example,  a  good  guess  would  be 
Aopi  of  the  previous  frame.  Assume  we  have  judiciously  chosen  two  values  of 
A,  A/  and  A^  with  A,  <  A,,  which  satisfy  the  relation: 

Y^R:{K)<Rbudge, 

I  t 

Note  that  failure  to  find  any  A/,A„  which  satisfy  the  above  inequalities 
means  that  the  given  problem  is  unsol vable;  i.e.  the  Rbudget  >s  inconsistent 
with  the  given  sets  of  quantizers.  A  conservative  choice  for  a  solvable  problem 
would  be  A/  =  0,  A„  =  oo. 

Phase  II:  Iterating  towards  the  optimal  operating  point 

Now  the  following  “main”  algorithm  can  be  used  to  iteratively  call  the  “sub¬ 
routine”  algorithm  of  the  previous  section: 

Step  1:  Pick  A/  <  A„  such  that 


If  the  inequality  above  is  an  equality  for  cither  slope  value,  stop.  We  have 
an  exact  solution.  Otherwise,  proceed  to  Step  2. 


Step  2: 


'ner! 


I  A-(,  where  c  is  a  vanishingly  small 


positive  number  picked  to  ensure  that  the  lower  rate  point  is  picked  if  A„„( 
is  a  singular  slope  value. 

Step  3:  Run  the  Phase  1  optimal  algorilhm  for  A„„,. 

==>  if  {Z.  ^;(Aner()  =  E.  ^* ( A  '  ,  then  stop.  A"  =  A„. 
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■  else  if  (^i'  (^nej*<)  ^  ^hudgtt'^n  •  ^nezt  •  Go  tO  Step  2* 


else  A. 


■^nert-  Go  tO  Step  2. 


Toy  Example 

See  Figure  10  for  a  plot  of  the  convex  hull  to  the  operational  rate-distortion 
curve  for  the  given  problem.  Shown  explicitly  are  the  optimal  quantizer  and 
the  best  basis  choice  for  each  operating  point,  which  corresponds  to  singular 
values  of  A,  whose  sweep  from  0  to  oo  results  in  the  tracing  out  of  all  convex 
hull  points.  The  budget  constraint  line  of  21  bits  is  obviously  an  inaccessible 
convex  hull  solution,  and  one  has  to  settle  for  the  convex  hull  operating 
point  using  20  bits.  Note  also  the  non-monotonic  nature  of  the  sequence  of 
the  depths  of  the  best  bases  subtrees  as  one  sweeps  A  through  all  positive 
real  numbers. 

Let  us  fust  show  an  example  of  how  the  Phase  I  algorithm  works  for 

A  =  10  (to  pick  a  nice  number)  and  show  how  it  leads  to  the  lowest  quality 

convex  hull  point  of  Figure  10,  which  is  picked  for  all  values  of  A  >  5.43; 

i)  Populate  the  tree  with  the  minimum  of  all  the  Lagrangian  cost  func¬ 
tionals  for  A  =  10  as  outlined  in  Phase  I  of  the  algorithm  for  each  node  A, 
B1,B2,C1,C2,C3,  and  C4  to  get; 

Ja=23\  (achieved  with  quantizer  Q3) 

JB^=92.b  (Q3);  Jbj=102.3  (Q3) 

Jci  =46.25  (Q3);  Jc2=52.25  (Q3);  Jcz=b2.2b  (Q3);  Jc.  =60.25  (Q2) 

ii)  Initialize  i=2;  Jci=46.25;  Jc2=52.25;  Jc3=52.25;  Jc<=60.25; 

iii)  i  =  l;  Since  Jb\  <  Ja  +  Jci,  split{B\)  *—  NO\ 

Jb7  <  jc2  +  split{B2)  *—  NO\ 

Jb\  ~  Jbu  Jb7  =  Jb7 


iv)  i=0;  Since  Ja  >  Jbi  +  sp}ii[A)  *—  YES', 
Ja  -  Jbi  +  Jb2', 
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Figure  10:  Composite  R-D  curve  for  toy  example  shown  with  best  basis 
and  best  quantizer  choices  for  all  convex-hull  points,  and  with  optimal  tree 
structure  for  the  given  budget  constraint.  Note  the  non-monotonicity  of  the 
R-D  characteristics  with  tree  depth,  i.e.  non-conformance  with  the  Chou  et. 
al.[6]  assumptions. 
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We  thus  have  our  optimal  basis  subtree  (with  quantizer  choice)  for  this 
value  of  A,  as  shown  as  the  lowest  rate  convex  hull  point  of  Figure  10.  We 
now  explain  in  detail  the  search  for  A*  for  the  toy  problem  with  a  coding 
budget  Rbudsei  =  21  bits.  Refer  to  Figure  10. 

I)  Initialize  Aj°^  =  0;  A|f^  =  oo. 

/?-(AJ  =  32;.R-(A,)  =  16; 

II)  A„„.  =  i^  +  ^  =  2-n  +  ^: 

==>  A|'>  =  2.17;  A(.*)  =  oc; 

III)  +  t  =  3.96  +  t: 

R'i^nert)  =  20  <  Rbudgel  =21; 

==>  aP>  =  2.17;  A(^)  =  3.96; 

IV)  A„,,,  =  +  e  =  2.4S  +  c; 

/?-(A„„0  =  i?-{A„)  =  20; 

\\\  have  converged!  =>  A’  =  2. 48;  D{X’)  -  12.95;  R(A  )  —  20, 

7  Image  coding  application  using  quadtree 
segmentation 

We  now  describe  an  image  processing  application  of  the  optimal  pruning  al¬ 
gorithm  described  earlier  for  the  particular  case  where  the  tree  structure  is 
quadtree,  and  the  basis  family  is  the  DCT  transform.  The  coding  environ¬ 
ment  is  a  modified  version  of  the  still-image  coding  standard,  JPEG  (S). 

7.1  Adaptive  quantization 

One  of  the  keys  to  achieving  good  signal  compression  is  to  have  the  quanti¬ 
zation  process  adapt  dynamically  to  the  signal’s  non-stationarities.  One  way 
to  accomplish  this  is  to  have  classified  quantizers.  The  concept  of  classifying 
quantizers  for  VQ  applications  was  done  in  [16],  and  the  use  of  classification 
in  transform  coding  applications  is  also  not  new.  The  idea  of  classification  is 
to  have  an  assortment  of  classes  with  each  class  tuned  to  a  particular  signal 
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characteristic.  For  our  image  coding  aj^plication,  through  empirical  toil,  we 
determined  that  a  good  tradeoff  between  overclassificaiion  with  subsequent 
high  side-information  overhead  cost  and  being  “overslatic"  by  not  adapting 
enough  was  to  have  4  classes,  each  optimized  for  a  particular  image  charac¬ 
teristic. 

Our  four  quantizer  classes  were  optimized  for  (1)  a  “typical”  image  sub¬ 
block  with  low  frequencies  weighted  much  higher  than  the  perceptually  less 
sensitive  higher  frequencies,  a  la  the  JPEG  suggested  matrix;  (2)  horizontal 
edges;  (3)vertical  edges;  and  (4)  image  subblocks  which  are  “white”  in  their 
frequency  spectrum,  with  no  discernible  favoritism  towards  any  specific  ori¬ 
entation.  A  point  of  note  when  classifying  transform  coded  images  is  that  the 
DCT  transform  is  completely  symmetric  with  respect  to  phase  reversals  in 
the  intensity  gradients,  thus  requiring,  for  example,  a  single  horizontal  quan¬ 
tizer  matrix,  as  opposed  to  one  for  a  light-to-dark  horizontal  gradient,  and 
another  for  a  dark-to-light  one.  This  cuts  down  the  number  of  classification 
quantizers  needed,  as  compared  to  the  application  of  [16]. 

The  admissible  classes  of  quantizers  described  above  were  constructed  for 
each  of  3  hierarchical  levels  of  the  DCT  basis  tree:  4x4  blocks,  8x8  blocks, 
and  16x16  blocks  for  every  16x16  subblock  of  the  original  image.  As  men¬ 
tioned  earlier,  this  constraint  is  in  keeping  with  the  perceptual  blockiness 
requirements  [17],  as  well  as  the  lack  of  usefulness  of  the  DCT  transform  for 
sizes  that  are  too  small. 

7.2  Coding  description  and  simulation  results 

The  DCT  tree  was  grown  to  the  three  hierarchical  levels  described  for  every 
16x16  subblock  into  which  the  image  was  divided.  This  is  equivalent  to  a 
parallel  pruning  of  the  16x16  subblock  trees  into  which  the  original  image 
was  divided  for  independent  coding  of  the  subblocks.  The  optimal  pruning 
algorithm  as  described  earlier  was  invoked  for  each  subblock  tree.  “Pseudo- 
JPEG”  coding  algorithms  were  followed  for  the  non  8xS  blocks,  *  i.e.  DCT 
transformation,  quantization  using  classified  quantizers,  zigzag  scanning,  and 
RL  coding  of  the  zero  runs,  etc.  Figure  11  shows  comparisons  of  the  adap¬ 
tive  DCT-quadtree  coder  versus  the  baseline  JPEG  coder  plotting  the  PSNR 
(Peak  Signal  to  Noise  Ratio)  defined  as  10 log,o[255’/(m.s.c)]  versus  bpp 

^The  standard  JPEG  algoritlim  is  applicable  only  to  8x8  blocks 
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(bits  per  pixel)  for  some  typical  test  images  used  in  the  image  processing 
community.  The  results  are  compared  with  a  typical  JPEG  coder  deploying 
the  suggested  JPEG  quantization  matrix  [8].  In  formulating  the  pseudo- 
JPEG  algorithm  for  the  non-SxS  blocks,  the  same  default  Huffman  coding 
table  was  used  as  outlined  in  the  baseline  JPEG  specification  for  8x8  blocks, 
and  hence  is  suboptimal  in  general.  This  essentially  places  a  lower  bound  on 
the  performance  of  the  adaptive  coder,  which  can  perform  better  if  Huffman 
codes  customized  to  16x16  and  4x4  blocks  are  created.  The  hierarchy  of  clas¬ 
sified  quantizers  was  tried  on  both  original  images  as  well  as  difference  images. 
The  difference  images  were  derived  from  a  hybrid  DCT-based  pyramid  coding 
scheme  [IS].  See  Figure  12  for  performance  comparisons.  As  can  be  seen,  for 
the  hierarcliy  of  classified  quantizers  picked  and  for  the  pseudo- JPEG  cod¬ 
ing  sclieme  invoked,  tlie  adaptive  DCT-based  quadtree  coder  outperforms 
the  standard  static  JPEG  coder  by  about  1.5-2  dB  at  typical  bitrates,  or 
alternatively,  by  15%-25%  reduction  in  bitrate  at  typical  PSNR  values.  For 
difficult  images  like  “Barbara,”  even  more  impressive  results  were  obtained. 
As  can  be  seen  from  the  plots,  our  adaptive  scheme  outperforms  the  static 
JPEG  scheme  for  this  image  by  about  2-3  dB  at  fixed  bitrate,  or  equivalently 
about  25-35%  compression  advantage  at  fixed  SNR,  over  an  entire  range  of 
bit  rates  of  interest. 


Figure  1 1 :  Comparison  of  adaptive  depth-3  block  DCT  basis  quadtree  coding 
scheme  with  non-adaptive  JPEG  coding  scheme  for  the  “Barbara”  and  “mit” 
images. 


34 


S  Af IT/AFOSR  Wavelets  Workshop 


hr  nth-Bax  (*riuxB)  mmf 


lx«b—  hr  MrTIA/larxB)  rx«r 


Figure  12:  Comparison  of  adaptive  depth-3  block  DCT  ba^is  quadtree  coding 
scheme  with  non-adaptive  JPEG  coding  scheme  for  difference  images  derived 
from  the  “table-tennis”  and  “mit”  images. 

8  Conclusion 

We  have  shown,  for  a  given  hierarchy  of  admissible  quantizers,  an  efficient 
scheme  for  coding  adaptive  trees  whose  individual  nodes  spawn  off  descen- 
dents  forming  a  disjoint  and  complcie  basis  cover  for  the  space  spanned  by 
their  parent  nodes.  The  scheme  presented  guarantees  operation  on  the  con¬ 
vex  hull  of  the  operational  R-D  curve  for  the  admissible  hierarchy  of  quantiz¬ 
ers.  Applications  for  this  coding  technique  include  the  CMQW  generalized 
multiresolution  wavelet  packet  decomposition,  iterative  subband  coders,  and 
quadtree  structures.  An  application  to  image  processing  involving  quadtrees 
with  a  family  of  DCT  bases  has  been  demonstrated  in  a  JPEG-like  coding 
environme:.l  with  good  improvement  shown  over  the  static  JPEG  coding 
scheme. 
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Appendix 

Proof  of  Lemma  1: 

Proof:  Denoting  A3  =  0X\  +  (1  —  0)X2i  where  0  <  ^  <  1,  and  recalling 

the  vector  notation  x  to  represent  the  sequence  ij ,  . . . ,  ,  we  have; 


H^(A3)  =  VP(^?Ai +(1 -0)Aj) 

=  min  {D(x)  +  \6Xx  +  (1  —  6)X2]R{x)  —  XsRb] 
xexM 

>  min  ^?[£)(x)  +  Ai/?(x)  —  A)  Hfc]  +  min  (1  -  ^?)[£>(x)  +  Aj/?(x)  -  AjHb] 

x€XM  xeXM 

=  <?1P(A,)  +  (1 -<?)1P(A2)  □ 


Proof  of  Theorem  2: 


Proof:  Denote  by  A'  the  slope  of  the  convex  hull  face  which  “straddles" 

the  budget  constraint  line  on  the  R-D  plane.  See  Figure  7.  Let  us  consider 
the  convex-hull  face  of  slope  A'  as  a  candidate  for  A"  =  max"’(l1^{A)). 

For  A  <  A',  invoking  the  “lower  rate"  operating  point  on  the  convex  hull 
of  slope  X',X(,  we  have: 

1F(A)  -  IF(A')  =  min  [Z?(x)  +  XR{x)  -  XR^]  -  [D(X()  +  X'R{x^)  A'Rt] 

x€XM 

<  [D(x<)  -f-  XR{xi)  -  XRt]  -  [D(x^)  +  X'R{xi^)  -  X'Rt,] 

<  (A- A')(/?(x^)-Rt) 

<  0 

Similarly,  for  A  >  A',  invoking  the  “higher  rate”  operating  point  on  the 
convex  hull  of  slope  A',x.^,  we  have: 

W{X)  -  V^(A')  <  (A  -  X'){R{x^)  -  Rt)<0 
Thus,  for  all  positive  values  of  A,1V(A)  <  IV(A’). 

But,  by  virtue  of  Lemma  1,  we  know  that  1V(A),  being  concave  0,  has  a 
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unique  maximum  value  which  occurs  at  a  singular  slope.  (If  it  were  non¬ 
singular,  then  there  exists  an  c  >  0,  no  matter  how  small,  for  which  U'(A’  -f 
c)  —  =  A’c  >  0,  which  contradicts  the  definition  of  VI''(A*)). 

Thus,  A'  is  indeed  this  unique  sigular  maximum,  with  the  optimal  convex 
hull  operating  point  obviously  being  Xf.  □ 
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Abstract 

Recent  work  has  shown  that  perfect  reconstruction  filter  banks  can  be  used  to  derive 
continuous-time  bases  of  wavelets;  the  case  of  finite  Impulse  response  filters,  which  lead 
to  compactly  supported  wavelets,  has  been  examined  in  detail.  In  this  paper  we  show 
that  infinite  impulse  response  filters  lead  to  more  general  wavelets  of  infinite  support. 
We  give  a  complete  constructive  method  which  yields  all  orthogonal  two  channel  filter 
banks,  where  the  filters  have  rational  transfer  functions,  and  show  how  these  can 
be  used  to  generate  orthonormal  wavelet  bases.  A  family  of  orthonormal  wavelets, 
which  shares  with  those  of  Daubechies  the  property  of  having  a  maximum  number  of 
disappearing  moments,  is  shown  to  be  generated  by  the  halfband  Butterworth  filters. 
When  there  is  an  odd  number  of  zeros  at  x  we  show  that  closed  forms  for  the  filters 
are  available  without  need  for  factorization.  A  still  larger  class  of  orthonormal  wavelet 
bases  having  the  same  moment  properties  is  presented,  and  contains  the  Daubechies 
and  Butterworth  fillers  as  the  boundary  cases.  We  then  show  that  it  is  possible  to  have 
both  linear  phase  and  orthogonality  in  the  infinite  impulse  response  case,  and  give  a 
constructive  method.  We  show  how  compactly  supported  bases  may  be  orthogonalized, 
and  construct  bases  for  the  spline  function  spaces.  These  are  alternatives  to  those  of 
Battle  and  Lemari4,  but  have  the  advantage  of  being  based  on  filter  banks  where  the 
filters  have  rational  transfer  functions  and  are  thus  realizable.  Design  examples  are 
presented  throughout. 
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1  Introduction 

The  subject  of  wavelets  has  been  studied  by  applied  mathematidans  for  a  number  of  years, 
as  representing  an  alternative  to  traditional  Fourier  based  analysis  techniques.  Considerable 
interest  has  been  shown  by  the  Signal  Processing  community  more  recently  owing,  in  large 
measure,  to  the  influence  of  pivotal  papers  by  Mallat  |1,  2]  and  Daubechies  (3).  These 
demonstrate  the  strong  link  between  the  subject  of  Wavelets  and  that  of  multirate  filter 
banks.  Briefly  put,  multirate  filter  banks  give  the  structures  r^uired  to  generate  important 
cases  of  wavelets  and  the  wavelet  transform.  Wavelets  provides  an  elegant  mathematical 
basis  for  multirate  filter  banks,  and  that  mathematical  machinery  has  led  to  new  results  [4]. 

Among  the  most  celebrated  wavelet  bases  are  those  of  Meyer  (5]  and  of  Battle  and 
Lemarie  [6,  7],  and  these  can  be  realized  using  orthogonal  multirale  filter  banks;  however  the 
filters  involved  are  not  rational,  and  the  corresponding  wavelet  cannot  be  computed  exactly, 
so  they  axe  of  limited  worth  from  a  Signal  Processing  point  of  view.  More  interesting  are 
the  compactly  supported  wavelets  of  Daubechies  [3].  These  are  based  on  orthogonal  Finite 
Impulse  Response  (FIR)  filter  banks,  which  have  in  fact  been  under  study  for  some  time 
[8,  9,  10). 

Our  principal  interest  in  this  paper  is  again  orthogonal  Alter  banks  and  their  relation  to 
wavelet  bases.  We  consider  Infinite  Impulse  Response  (HR)  filter  banks,  which  have  been 
less  studied,  and  which  allow  much  greater  freedom  than  their  FIR  counterparts. 

The  essential  contribution  of  the  paper  consists  of  strong  new  results  on  orthogonal  IIR 
filter  banks  which  allow  us  to  thoroughly  examine  the  structure  of  possible  solutions,  and 
present  new  designs.  The  connection  with  Wavelets  allows  us  to  use  these  designs  to  get 
novel  orthogonal  wavelets,  which  are  based  on  structure  that  are  cornputable  with  finite 
complexity.  Thus  we  present  filters  that  are  of  interest  in  their  own  right,  but  which  also 
allow  us  to  generate  wavelet  bases  which  are  in  some  senses  comparable,  and  in  others 
superior  to  those  already  published. 

The  summary  of  the  paper  is  as  follows.  We  present  a  succinct  review  of  the  relation 
between  orthonormal  wavelet  bases  and  filter  banks  having  orthogonality  properties  in  sec¬ 
tion  2.  We  recall  that  designing  a  certain  class  of  orthonormal  wavelet  bases  is  related  to 
the  simpler  problem  of  designing  orthogonal  filter  banks,  provided  that  the  filters  satisfy 
certain  regularity  conditions.  Since  this  material  has  been  reviewed  in  a  number  of  papers 
our  treatment  is  limited  to  the  essential  points.  F..eaders  unfamiliar  with  the  subject  might 
consult  [8,  11,  12]  for  additional  coverage  of  filter  banks,  and  [13,  3,  5,  4,  14]  for  treatments 
of  the  connection  with  wavelets. 

In  section  3  we  present  a  constructive  method  to  find  all  orthogonal  filter  banks,  where 
the  filters  have  rational  transfer  functions.  In  certain  cases  of  considerable  interest,  we 
^tually  get  closed  form  expressions  for  the  filters;  so  that  no  factorization  or  approximation 
is  necessary.  This  contrasts  sharply  with  the  FIR  case,  and  seems  to  be  the  first  closed  form 
for  a  non-trivia]  implementable  wavelet. 

Section  4  demonstrates  that  wavelets  with  moment  properties  are  derived  from  filter 
banks  where  the  filter  frequency  responses  are  maximally  fiat.  We  construct  the  whole 
family  of  maximally  flat  filters  for  orthogonal  filter  banks,  and  show  that  the  Butterworth 
halfband  filters  and  the  Daubechies  solutions  are  included  as  special  cases. 

Section  5  illustrates  that  linear  phase  and  orthogonality  are  not  mutually  exclusive  prop>- 
erties  for  IIR  filter  banks,  as  they  were  in  the  FIR  case.  Filters  of  considerable  interest  can 
be  designed  that  lead  to  orthogonal  wavelets  with  symmetry. 

We  show  in  section  6  that  if  a  compactly  supported  basis  for  one  of  the  spaces  in  the 
multiresolution  analysis  structure  exists,  then  we  can  always  generate  an  orthogonal  basis 
from  realizable  IIR  filters.  A  special  case  is  the  A^-th  order  spline  function  space;’ so  we 
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construct  bases  which  have  the  advantage  over  those  of  Battle  and  Lemarie  that  they  are 
realizable. 

Certain  of  the  results  have  been  presented  in  preliminary  form  in  [15,  16,  17]. 


1.1  Notation 

The  set  of  real' numbers  will  be  represented  by  R,  the  set  of  integers  by  Z.  The  inner  product 
over  the  space  of  square-summable  sequences  P{Z)  is: 

<a(n),6(n)>=  ^  a'(n)6{n). 

ns— oo 


where  a(n),6(n)  c  /^(Z),  and  superscript  denotes  complex  conjugation.  Generally  we 
shall  deal  with  sequences  and  functions  that  are  real.  We  define  ||  a{n)  jjj  =  <  a(n),a(n)  >, 
The  z-transform  of  a  sequence  is  defined  by  H{:)  =  h(Ti)z~’'.  In  an  abuse  of  notation 

we  shall  use  the  same  symbol  for  the  Discrete  Fourier  Transfrom 
Similarly  over  the  space  of  square-integrable  functions  L}[R)  we  have  the  inner  product: 


<  >  = 


f'{x)g{x)dx. 


where  f{x),g{x)  c  L’^{R).  The  squared  norm  is  given  by  1|  /(i)  [j^  =  <  /(x),/(i)  >.  For 
continuous-time  functions  we  will  use  subscripts  to  denote  affine  variable  changes  where  the 
scales  are  powers  of  2  as  follows:  fjk{^)  —  •  /(2"'’x  --  ^)- 

Otir  main  interest  is  with  filters  that  have  r-transforms  that  can  be  written  as  H{z)  == 
z'’A{z)fB{z)  for  some  A{z)  and  B{z)  which  are  polynomials  in  Since  we  deal  with 
both  causad  and  anticausal  filters  we  shall  often  have  positive  and  negative  powers  of  A 
function  that  has  terms  in  both  z  and  is  not  a  polynomial,  but  we  refer  to  it  as  an  FIR 
function  provided  that  it  has  a  finite  number  of  terms.  The  following  shorthand  notation  for 
a  causal  FIR  functions  of  length  N  is  used:  On‘~"  =  (oo, oj, 02,  •  •  * 

In  the  following  we  shall  refer  to  any  symmetric  or  antisymmetric  filter  that  nas  a  central 
term  as  having  whole  sample  symmetry  (WSS)  or  whole  sample  antisymmetry  (WSA),  and 
one  that  does  not  have  a  central  term  as  having  half  sample  symmetry  or  antisymmetry 
(HSS  or  HSA).  In  the  case  of  FIR  filters  WSS  and  WSA  correspond  to  filters  of  odd  length, 
and  are  often  referred  to  Type  I  and  Type  III  filters  respectively  (18);  whereas  HSS  and  HSA 
imply  filters  of  even  length  of  Type  II  and  Type  IV  respectively.  Some  of  the  basic  properties 
of  symmetric  sequences  that  we  will  have  need  of  are  reviewed  in  appendix  A.l. 


2  Wavelets  and  filter  banks 

2.1  Multiresolution  signal  processing 

The  material  of  this  section  can  also  be  found  in  [3,  1,  4,  19).  Two  texts  give  very  compre¬ 
hensive  treatments  [5,  20);  a  more  tutorial  approach  is  given  in  [21]. 
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2.1.1  Continuous-time  bases  for  multiresolution  analysis 

The  axiomatic  description  of  a  multiresoiution  analysis  scheme,  as  introduced  by  Mallat  and 
Meyer  [22,  2,  5]  is  that  we  should  have: 

(i)  A  succession  of  spaces: 

•••VaC  VicVoCK.,...,  (1) 

where  the  union  of  all  the  V^’s  is  L^iR),  and  the  intersection  of  all  of  the  spaces  contains 
only  the  origin, 

(ii)  fix)  e  Vj  o  f{2x)  t 

(iii)  3  ^(i)  cVo  such  that  the  set  ^{x  —  n),n  c  Z  constitutes  an  orthonormal  basis  for 

It  follows  that  the  set  =  2“^^’  •  4>i2~^x  —  k),  k  c  Z}  is  an  orthonormal  basis  for  Vj. 

Next,  let  Wj  be  the  orthogonal  complement  of  Vj  in  that  is  x  c  Vj,  y  c  Wj  =>■ 

<  z,y  >  =  0  and  Vj_j  =  Vj  ©  Wj.  Obviously  Vj  C  Vj-\  and  Wj  C  V^_i,  so  that  the 
basis  functions  of  Vj  and  Wj  (^jfc(x)  and  tl>jkix)  respectively)  can  be  written  as  a  linear 
combination  of  the  basis  functions  of  V^_i  (  ^j_ijt(x)).  This  gives  the  relations: 

^(x)  =  2‘/^-  hoin)  ■  (i>{2x  -  n),  (2) 

ns«»oo 

«»)  =  £  ),.(n)  •  *(2l  -  n),  (3) 

nss— oo 

where  infinitely  many  of  the  ho{n)  and  h\in)  may  differ  from  zero.  Since  (2)  relates  ^(x) 
and  ^(2x)  it  is  called  a  two-scale  difference  equation.  Note  that  (f>{x)  and  ^(x)  are  called 
the  seeing  function  and  wavelet  respectively. 

Because  Vj  and  Wj  are  orthogonal  we  find: 

<  <i>(x),xl’{x  —  k)  >  =  0  =  <  ho{n),  /ii(n  —  2k)  >  .  (4) 

So  the  two  sequences  ho{n)  and  Ai(n)  must  be  orthogonal  with  respect  to  shifts  by  two. 
Equally,  by  imposing  the  constraint  that  the  bases  for  Vj  and  Wj  be  orthogonal: 

<  0(x),  <t>{x  -k)  >  =  6k  =  <  ^(x),  tkix  -  k)>, 

we  find  that: 

<  ho[n),ho{n  -  2k)  >  =  6k  =  <  hi{n),hi{n  -  2k)  >  .  (5) 


2.2  Wavelets  derived  from  filter  banks 

We  have  seen  above  that  the  multiresolution  analysis  scheme  with  orthogonal  basis  functions 
satisfying  (2)  and  (3)  implies  certain  restrictions  on  the  related  sequences  Ao(n)  and  Ai(n); 
that  is  (4)  and  (5)  must  hold.  Also,  since  <  ^(2x  —  A),  ^(x)  >  =  2~*^^  •  Ao(it),  and 

<  V’(2x  —  A:),^(x)  >  =  2“*^^Ai(A:),  it  is  obvious  that  once  the  basis  functions  ^(x)  and 
0(x)  are  known  the  related  filters  are  easily  found.  However  it  is  not  yet  obvious  how 
functions  satisfying  the  desired  constraints  may  be  found. 

‘Actually  it  is  sufficient  to  have  a  basis,  which  can  then  be  orthogonalized. 
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2.2.1  Limit  functions  of  filter  banks 

A  way  to  construct  and  tp(x)  from  the  associated  discrete  sequences  was  first  shown 
by  Daubechies  in  (3).  Essentially  it  entails  considering  the  limit  of  a  sequence  of  functions 
y(*)(x)  which  are  piecewise  constant  on  intervals  of  length  1/2'.  The  value  of  the  constant 
is  equal  to  the  coefficient  of  an  filter  found  by  cascading  »  copies  of  the  filter  Ifo(x)  followed 
by  a  subsampler  [4j. 

Assuming  for  the  moment  that  the  limit  exists,  and  that  the  filters  /io(n),  /ii(n)  satisfy 
the  orthogonality  constraints  (4)  and  (5),  we  have  as  i  — +  co 

=  f;  Ao(m)/«~)(2i-m).  (6) 

ms^OD 

Taking  the  Fourier  transform: 

.  F<~>(u)/2).  (7) 

Now  define  Mo(e^“')  =  so  that: 

=  (8) 

Now  also  consider  the  related  function: 

=  2-'/^  •  i/i(e^“'/*)  •  F(~)(W2), 


so  that: 

=  2^/’ .  f:  /ij(m)/‘«)(2x-m).  (9) 

m=-*oo 

On  comparing  (6)  and  (2),  (9)  and  (3)  we  now  see  that  /^‘"^(x),  and  y^'“^(x)  satisfy  the 
two  scale  difference  equations  required  of  the  orthonormid  wavelet  construction.  It  can  be 
verified  that  the  orthogonality  relations  of  the  functions  /^“’^(x),  and  y^'®l(x)  follow  from 
the  orthogonality  of  the  filters  [3,  4). 

Thus  the  problem  of  finding  the  basis  functions  for  the  wavelet  scheme  is  reduced  to  one 
of  finding  appropriate  pairs  of  sequences  ho(n)  and  /ij(n).  For  much  of  the  rest  of  the  paper, 
we  will  concern  ourselves  with  this. 

2.2.2  Regularity 

That  the  infinite  product  (8)  converges  as  i  — »  oo  cannot  be  taken  for  granted.  Cases  where 
convergence  fails  altogether,  or  where  the  product  converges  to  a  discontinuous  function  are 
easily  found.  We  would  like  some  guarantees  about  the  convergence  of  (8)  and  the  continuity 
of  the  functions  ^fx)  and  when  they  exist.  Exactly  such  a  criterion  is  derived  in  [3], 

and  is  reviewed  below. 

First  factor  Afo(x)  into  its  roots  at  z  =  — 1  (if  there  is  not  at  least  one  then  the  infinite 
product  cannot  converge  [23,  24])  and  a  remainder  function  F(^),  in  the  following  way: 

M„(r)  =  ((l  +  z-')/2l''/l'W. 
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Note  that  it  can  be  shown  that  A’(l)  =  1  from  the  definitions;  i.c.  (5)  gives  )  =  2,  so  that 
Mo(l)  =  1.  Now  call  B  the  supremum  of  l/v(r)l  on  the  unit  circle:  B  =  sup^jo.j,) 

Then  the  following  sufficient,  but  not  necessary,  test  from  |3]  can  be  used; 

Proposition  2.1  (Daubechies  1988)  If  B  <  then  the  piecewise  constant  function 

/^*^(z)  defined^in  (8)  converges  pointwise  to  a  continuous  function 


2.3  Filter  banks 

We  now  put  the  connection  between  hlter  banks  and  wavelets  to  work.  Our  interest  in 
this  paper  is  orthogonal  wavelet  bases,  hence  we  restrict  our  attention  to  orthogonal  filter 
banks.  More  gener^  perfect  reconstruction  filter  banks  give  rise  to  biorthogonal  systems  just 
as  in  the  FIR  case  (4).  We  assume  some  familiarity  with  the  basic  properties  of  multirate 
operations;  these  are  detailed  for  example  in  (S,  11,  12]. 


2.3.1  Perfect  reconstruction 


The  structure  shown  in  Figure  1  is  a  maximally  decimated  two  channel  multirate  filter  bank. 
If  J^(r)  =  X{2)  the  filter  bank  has  the  perfect  reconstruction  property,  and  we  refer  to  it  as 
a  PRFB.  We  now  make  the  following  choice  for  the  synthesis  filters: 

|Go(i),  G.(r)]  =  (10) 

and  choose  Hi{z)  =  •?”')•  It  is  easily  shown  that  the  output  X(z)  of  the  overall 

analysis/synthesis  system  is  then  given  by: 


X(z) 


'^o(^) 

■  A'(^)  ■ 

X(-z) 

+  Bo(-z)Bo(-z-^)]  ■  X(z). 


(11) 

(12) 


So  for  this  arrangement  of  the  filters  it  is  clear  that  we  get  perfect  reconstruction  provided: 

//o(r)//o(r->)  +  No(-z)No{-z-^)  =  2. 

The  importance  of  this  construction  is  established  by  the  next  lemma. 

We  first  introduce  aditional  notation  that  we  will  need.  The  2x2  matrix  in  (11)  is 
called  Hm(*),  the  modulation  matrix  of  the  system.  The  following  polyphase  notation  for 
the  filters  is  standard  [8,  11,  12). 

Bi(z)=^JI.o(z^)-i-z-^Bn(z^), 

that  is  hio(n)  contains  the  even-indexed  coefficients  of  the  filter  /i,(n),  while  /i,i(n)  contains 
the  odd  ones.  Thus: 


_  1 

//o(2)  ^Oi-Z) 

1  r 

■  1  0  ■ 

“  2 

.  //.(^)  ^l(-^). 

1-1 

0  2 

The  matrix  on  the  left  hand  side  is  called  the  polyphase  matrix  Hp(2^). 
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Lemma  2.2  The  following  are  equivalent: 
fGj(H„(z->)F  .H„(z)  =  2.I, 

(c)  Ho{z)Ho{z-'^)  +  ^o(~^)^o(-2"')  =  2  and  H,{z)  =  Hoi-z-^)A{z^),  where  A{z)  is 

allpass, 

(d)  <  hi(n),h^(n  —  2k)  >=  6k  and  <  ho{n),hi{n  —  2k)  >  =  0  'i  k  tZ. 

A  proof  can  be  found  in  [25].  It  is  also  proved  that  the  choice  of  synthesis  filters  (10)  is 
unique  for  the  orthogonal  construction. 

Because  of  the  impulse  response  relations  in  (d)  we  shall  refer  to  any  filter  bank  satisfying 
the  conditions  of  Lemma  2.2  as  orthogonal-,  in  the  filter  bank  literature  the  terms  orthogonal, 
paraunitary  and  lossless  are  often  used  interchangeably  [26].  Observe  that  in  Lemma  2.2(c) 
we  use  functions  of  the  form  /f,(z)/f, ■(;“*),  which  are  called  autocorrelation  or  positive  real 
functions.  It  deserves  mention  that  the  study  of  lossless  systems  and  positive  real  functions 
has  a  long  history  in  both  circuit  theory  and  signal  processing  [27,  28,  29,  26].  When  we 
wish  to  impose  orthogonality  on  the  filter  bank  to  be  used,  we  shall  use  whicnever  of  the 
equivalent  conditions  of  Lemma  2.2  is  most  convenient. 

Note  also  that  if  we  define  P{2)  =  ho{z)Ho{z~^)  then  (c)  requires  that  in  addition  to 
being  an  autocorrelation,  P{z)  satisfies 

P(z)  +  P(-z)  =  2.  (14) 

Since  this  condition  plays  an  important  role  in  what  follows,  we  will  refer  to  any  function 
having  this  property  as  valid.  Much  of  the  focus  of  the  paper  will  be  in  designing  auto¬ 
correlation  functions  that  are  valid.  We  shall  be  interested  only  in  valid  functions  that  are 
rational,  so  that  they  can  be  factored  into  rational  filters  I/oi^)  and  Hq(z~'^),  These  filters 
can  then  be  implemented  using  recursive  difference  equations  [18],  whereas  filters  that  do 
not  have  ration^  transfer  functions  have  no  finite  complexity  j.  nysical  implementation. 


3  Orthogonal  HR  filter  banks 

We  have  already  seen  that  constructing  an  orthogonal  filter  bank  can  be  reduced  to  the  task 
of  finding  a  function  P{z)  which  is  a  valid  autocorrelation;  that  is  a  function  that  satisfies 
(14)  and  can  be  factored  as  P{z)  =  H{z)H{z~^).  We  first  establish  an  important  preliminary 
result  on  the  form  of  valid  rational  functions. 

Lemma  3.1  If  a  valid  rational  function  P{z)  has  no  common  factors  between  the  numerator 
and  denominator,  then  the  denominator  is  one  of  the  two  upsampled  polyphase  components 
of  the  numerator. 

Proof:  We  can  write: 

r 

P(r)  =  f;  lp(2n)  +  P(2n  4 

n=— oo 

so  the  constraint  gives: 

P{z)  4  P(-z)  =  2  ■  f;  =2  =!■  P(2n)  =  (15) 
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Clearly: 


ns— oo 


(16) 


If  F{z^)  has  no  common  factors  between  its  numerator  and  denominator,  then  they  must  each 
be  functions  oif  z’,  possibly  multiplied  by  some  delay  r*.  That  is  F{z^)  =  z‘‘ N{z^) / {z^ D{z^)). 
So  we  have: 


P(z)  = 


z-*Z)(z’)  + 

z-*£>(z2) 


Thus  the  denominator  is  the  first  polyphase  component  if  k  is  even,  and  the  second  if  k  is 
odd.  The  numerator  and  denominator  of  P{z)  are  coprime  if  and  only  if  N{z)  and  D{z)  are. 
□ 


3.1  Structure  of  the  solutions 

Clearly  Lemma  3.1  gives  a  simple  method  to  design  a  rational  function  P{z)  which  is  valid. 
Lemma  2.2  then  shows  that  this  can  be  used  to  give  an  orthogonal  filter  bzmk  if  this  function 
is  ilH  autocorrelation,  that  is  it  can  be  factored  as  P{z)  =  since  the  essential 

requirement  of  Lemma  2.2(c)  is  that  Hq{z)Ho{z~^)  be  valid.  The  next  theorem  puts  these 
parts  together  and  shows  how  to  design  valid  autocorrelation  functions,  and  hence  orthogonal 
filter  banks.  It’s  utility  is  that  it  is  constructive  and  complete. 


Theorem  3.2  All  orthogonal  rational  two  channel  filter  banks  can  be  formed  as  follows: 
(i)  Choosing  an  arbitrary  polynomial  R{z),  form: 


P{z) 


2./?(z)/?(z-J) 

R{z)R{z-^)  +  R{-z)R{-z-'^y 


(17) 


(ii)  Factor  as  P(z)  =  H{z)H{z~^), 

(Hi)  Form  the  filter  Ho{z)  =  Aq{z)H{z),  where  Ao{z)  is  an  arbitrary  allpass, 

(iv)  Choose  Hi{z)  =  Ho{—z~^)Ai{z^),  where  Ai{z)  is  again  an  arbitrary  allpass. 

(v)  Choose  Go{z)  =  Ho{z-^),  and  G^{z)  =  -H^{z-^). 

Proof:  From  Lemma  2.2(c)  it  is  necessary  and  sufficient  to  find  a  valid  rational  autocor¬ 
relation  function  P{z)\  since  once  this  is  factored  as  P{z)  =  Ho{z)Ho{z~^)  then  Hi{z)  is 
specified  by  Lemma  2.2(c),  and  Go{z)  and  G\{z)  by  (10). 

We  show  first  that  (17)  always  gives  a  valid,  rational  autocorrelation.  It  is  valid,  since: 

+  -  R(z)R(z-')  +  R(-z)R(-z-)  • 

=  2. 

It  is  clearly  rational,  R{z)  being  a  polynomial.  The  numerator  of  (17)  is  an  autocorrela¬ 
tion;  so  is  the  denominator,  since  it  is  the  sum  of  two  autocorrelations  R{z)R{z~^)  and 
R{—z)R{—z~^).  Hence  P{z)  itself  is  an  autocorrelation  and  can  be  factored: 

P^z)  =  R{z)R(2-‘)  =  R„(z)Ro(z-‘)  •  /lo(z)/lo(z-'), 
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for  some  H{z)  and  an  arbitrary  rational  allpass  Ao{z). 

Next  we  show  that  any  valid  rational  autocorrelation  can  be  written  as  in  (i7)  for  some 
polynomial  /?(r). 

First,  any  common  factors  between  the  numerator  and  denominator  of  the  given  function 
can  be  cancelled;  the  result  is  clearly  still  a  valid  rational  autocorrelation.  So  it  can  be  written 


Riz)R{z-^ 

B{z)B{z-^) 


for  some  polynomials  R{z)  and  B(z).  Now  we  can  use  Lemma  3.1  to  get  that  the  denomi¬ 
nator,  B{z)B{z~^),  is  one  of  the  upsampled  polyphase  components  of  the  numerator: 

Do{z^)  =  (/i(r)/?(r-‘)  -f  R{-z)R{-z-^)]/2, 


or 

A(--’)  =  R{-:)R{-z-')]/2. 

Note  that  R(z)R(z~^)  is  always  of  odd  length  and  is  symmetric.  It  follows  that  one  of  its 
upsampled  polyphase  components,  Do(z^),  is  whole  sample  symmetric  fWSS),  while  Dj  (z^)  is 
half  sample  symmetric  (HSS).  Since  half  sample  symmetric  polynomials  always  have  at  least 
one  zero  at  2  =  —1  (see  appendix  A.l),  £)i(z^)  is  not  a  suitable  choice  for  the  denominator, 
as  we  wish  to  avoid  poles  on  the  unit  circle.  We  therefore  have  that: 


R(^) 


2.R(z)R(z-') 

R(z)R{z-n-i-Ri-z)Ri-z-n’ 


(18) 


Note:  The  introduction  of  the  allpass  factors  Ao(r)  and  Ai(z)  aJfect  only  the  phase  of  the 
filters  to  be  implemented,  and  not  their  magnitudes.  Equally,  in  the  factorization  required  by 
step  (ii)  there  is  considerable  choice  for  the  phase  of  the  filters;  J/(z)  could  be  minimum  phase 
or  maximum  phase  or  mixed  phase.  The  magnitude  of  course  does  not  change.  Irrational 
orthogonal  factorizations  of  a  rational  P(z)  function  are  also  possible.  We  give  an  example 
in  section  4.4. 

The  theorem  shows  that  if  R(z)  ranges  over  the  polynomials  then  fl8)  is  complete  for 
rational  P(z)  functions.  If  R(z)  is  chosen  to  be  any  function,  rational  or  not,  it  is  clear 
by  inspection  that  (18)  will  still  be  a  valid  autocorrelation,  but  not  in  general  rational. 
Completeness  is  less  obvious  in  this  case. 

If  R(z)R(z-^)  is  itself  valid,  that  5s  R(z)R(z-^)  -f  R{-z)R(-z-^)  =  2,  and  Ao(2)  and 
Ai(z)  are  both  chosen  to  be  delays,  then  all  of  the  niters  specified  by  Theorem  3.2  are  FIR. 
The  synthesis  filters  are  always  time  reversed  versions  of  the  analysis  filters,  just  as  in  the 
orthogonal  FIR  case  [4].  All  of  the  FIR  orthogonal  filter  banks  can  be  implemented  in  a 
paraunitary  lattice  structure  [lO];  a  similar  result  is  true  for  HR  orthogonal  filter  banks  [30], 
so  an  efficient  and  numerically  robust  implementation  is  always  available. 

3.2  Closed  form  factorization 

Iheorem  3.2  establishes  the  importance  of  valid  rational  functions  which  are  autocorrela¬ 
tions.  Numerical  factorization  poses  certain  difficulties  however.  This  is  certainly  a  problem 
in  the  FIR  case;  for  example  even  when  P{z)  is  known  exactly,  the  accuracy  with  which  the 
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coefficients  of  Hq{z)  can  be  determined  is  dependent  on  the  numerical  robustness  of  the  root 
extraction  procedure. 

We  now  show  that  in  the  special  case  where  R{z)  is  symmetric  and  of  even  length, 
a  closed  form  factorization  is  available.  The  requirement  that  R(z)  be  symmetric  is  very 
reasonable,  since  the  numerator  has  to  control  the  stopband  of  the  filter  H{z)  and  typically 
has  all  of  it’s  zeros  on  the  unit  circle;  if  this  is  so,  then  /ifz)  is  symmetric  provided  that 
it  is  real.  For'  example  all  of  the  digital  Butterworth,  Chebyshev  and  elliptic  filters  have 
symmetric  numerators. 

Consider  a  causal  symmetric  FIR  function  R{z)  of  even  length  +  1.  Using  the  re¬ 
lationship  between  the  polyphase  components  given  in  fact  A.l  in  the  appendix:  Ri{z)  = 

Ro[z~')z~^^~^^f‘^  we  can  simplify: 


R{z)  =  Ro(i')  +  =  /?o(r')  +  z-^'Ro{z-^).  (19) 


This  gives: 

R{z)R{z-')  =  + 


Clearly,  since  N  is  odd: 

Bo(i')  = 


And  hence 


P{z)  = 


2Ro(r2)Ro(--')’ 

It  is  now  obvious  that  one  possible  choice  for  factorizing  P[z)  is: 

R{z) 


H{z)  = 


^/2Ro{z'^) 


(20) 


Since  R[z)  and  Ro(z^)  are  known  exactly,  this  is  a  closed  form;  so  R(z)  is  directly  available. 
Example  4.1  below  illustrates  this.  The  importance  of  this  result  can  be  seen  by  noting  that 
the  coefficients  of  the  wavelet  expansion  can  be  obtained  exactly,  since  they  do  not  depend 
on  any  numericcd  procedure  to  find  the  transfer  functions  Ho{z)  and  Hi{z).  This  appears  to 
be  the  first  closed  form  for  the  filters  used  to  generate  a  non-trivial  realizable  wavelet. 
Observe  that  H{z)  can  be  rewritten: 

//(z)  =  2-‘/'.(1  +  2-^A(z’)),  (21) 

f 

where  A{z)  =  Rq{z~^)/Rc{z)  is  an  {N  —  l)/2-th  order  allpass.  The  other  analysis  and 
synthesis  filters  have  similar  expressions,  and  thus  can  be  implemented  very  efficiently.  It  is 
worth  pointing  out  that  the  filters  in  this  particular  case  are  themselves  valid. 
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4  Wavelets  with  moment  properties 


According  to  Proposition  2.1  the  limits  of  iterated  orthogonal  digital  filter  banks  can  be  used 
to  derive  wavelet  bases.  The  sufficient  condition  to  guarantee  continuity  of  the  wavelets  was 
that  the  iterated  lowpass  filter,  that  is  should  contain  an  adequate  number  of  zeros 

at  z  =  —  1.  It  is  for  this  reason  that  in  the  design  of  compactly  supported  wavelet  bases 
[3,  31,  4]  the  etnphasis  was  placed  on  using  filters  that  have  a  maximum  number  of  zeros  at 
2  =  —1.  In  addition,  a  zero  of  order  at  z  =  — 1  in  Ho{z)  implies  N  vanishing  moments 
for  the  wavelet  [3): 

f  i*0(z)di  =  0  k  =  0,1,- •  •  N  —  1.  (22) 

It  can  be  shown  that  having  a  maximum  number  of  zeros  at  z  =  —1,  implies  a  maximally 
flat  characteristic  for  the  filters  involved  (3,  25,  32).  This  implies  that  both  the  wavelet  and 
the  filter  spectrum  have  considerable  smoothness,  which  may  be  advantageous  in  certain 
contexts. 

Our  procedure  to  design  orthogonal  filters  amounts  then  to  the  following: 

(i)  Choosing  B2^(z)  =  (1  +  z“’)^(l  -f  z)^  for  some  N, 

(ii)  Finding  least  degree  positive  real  F{z)  =  Fs{2)IFd[z)  such  that 


P(z)  =  BMz)F{z) 


{l-fz->)^{l  +  z)^F^r(z) 
Fd{z) 


is  valid,  and 

(iii)  Factoring  P(z)  =  Ho{z)Hq{z~'^). 

Of  course  in  [3]  only  FIR  solutions  were  of  interest;  so  the  solutions  had  Fd{z)  =  1. 
In  other  words  the  multiplicative  factor  F(r)  required  to  make  B2n{z)F{^z'\  vaJid  had  only 
zeros.  In  the  next  subsection  we  examine  the  opposite  extreme,  where  F\z)  is  all-pole,  i.e. 
Fn{z)  =  1.  These  in  fact  give  rise  to  the  Butterworth  halfband  filters. 

In  section  4.2  we  examine  solutions  intermediate  between  the  Daubechies  (F(z)  all  zero), 
and  Butterworth  {F{z)  all-pole);  that  is  where  F{z)  is  still  of  minimal  degree,  but  has  some 
combination  of  poles  and  zeros. 


4.1  Butterworth  wavelets 

Using  Theorem  3.2,  constructing  regular  IIR  filter  banks  that  lead  to  infinitely  supported 
wavelets  is  very  simple.  Following  Daubechies  and  the  FIR  case,  if  we  again  place  a  maximum 
number  of  zeros  at  z  =  —  1  then  we  simply  choose  R[z)  =  (1  -f  z~*)^.  This  gives: 


(l-f-z->)^(l-}-z)^ 
(z-‘-f-2  +  z)^-h(-z-»  +  2-z)^ 


//o(z)/fo(z-^). 


(23) 


These  filters  are  the  IIR  counterparts  of  the  FIR  filters  given  in  (3J  in  that  they  generate 
wavelets  with  regularity  that  increases  linearly  with  the  degree  N  of  the  zero  at  z  =  —1. 

These  are  in  fact  the  N-th  order  halfband  digital  Butterworth  filters  [18].  That  these 
particular  filters  satisfy  the  conditions  for  orthogonality  was  also  pointed  out  in  [33],  and 
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their  use  for  the  construction  of  wavelets  in  [34,  35].  The  Butterworth  filters  are  known  to 
be  the  maximally  flat  HR  filters  of  a  given  order. 

We  propose  these  Butterworth  wavelets  as  alternatives  to  the  compactly  supported  ex¬ 
amples  of  [3];  they  enjoy  exactly  the  same  moment  properties,  but  achieve  much  better 
filtering  action  for  the  same  complexity,  and  are  considerably  smoother.  An  additional  ad¬ 
vantage  is  that  since  R{z)  is  symmetric  we  can  make  use  of  the  closed  form  factorization  of 
section  3.2  if  v^e  choose  N  to  be  odd.  So  in  this  case  we  can  explicitly  write: 


^o{z) 


v/2.E 


IsO 


and  the  other  filters  follow  from  Theorem  3.2. 


Example  4.1  Take  R{z)  =  (1  -f-  2~')^  as  above  and  N  =  7,  so  that  we  can  use  the  closed 
form  factorization,  hence: 


(1,14, 91,364, 1001,2002, 3003, 3432, 3003, 2002, 1001, 364, 91, 14, U  • 
142®  -I-  3642“  -I-  200222  +  3432  +  20022-2  +  3642-“  -f  142-6 
E{z)E{z-^) 

Fiz)F{z-r 


where 


E(Z)  _  (1  +  72-^  +  2I2-2  +  352-3  ^  35.-4  ^  212-6  7^-6 

F{z)~  V2  •  (14- 2l2-2-f  352-“ -1-72-6) 


So  using  the  description  of  the  filters  in  Theorem  3.2,  with  the  simplest  case  Ao{z)  = 
Ai(2)  =  1  and  k  =  0  we  find: 


Tj  I  \  (l-l-/2-*-f-  2\z~^  4*  352—^  4-352  *  21  z~^  i  z~^ 

Iioiz)  —  ■  —  “ 

\/2 -(14-212-2  4- 352-“ -h  72-6) 

H  (2)  -  --1  ~  -  352^  +  352**  -  212^  4-  7z^  -  2^) 

^  ‘  n/^  •  (1  4- 2I22  +  352“ -H  726) 

Go{z)  =  Ho{z-^)  G,(2)=-/f,(2->). 

The  wavelet,  scaling  function  and  their  spectra  are  shown  in  Figure  2. 

4.2  Intermediate  solutions 

At  the  beginning  of  the  section  we  pointed  out  that  in  the  construction  of  wavelets  with  a 
certain  number  of  vanishing  moments,  the  essence  of  the  design  was  finding  a  minimal  degree 
^(2)  =  Fn{z)1  Fd{z),  such  that  P{z)  =  B2n{z)F{z)  was  vdid.  We  now  explore  examples 
between  the  extremes  of  the  Daubechies  {Fv(z)  =  1)  and  the  Butterworth  (F/^(z)  =  1) 
cases. 

First  note  that  when  F(z)  is  a  rational  autocorrelation,  both  numerator  and  denominator 
will  be  of  odd  length,  ana  symmetric.  As  pointed  out  in  the  proof  of  Theorem  3.2  the 
denominator  is  in  fact  the  upsampled  whole  sample  symmetric  (WSS)  polyphase  component 
of  the  numerator.  There  are  two  cases: 
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•  A  symmetric  FIR  function  of  length  41:  +  1  has  ai.  upsampled  WSS  component  of 
length  4k -\-l. 


•  A  symmetric  FIR  function  of  length  4k  +  3  has  an  upsampled  WSS  component  of 
length  4ik  +  1. 

$ 

To  find  a  solution  where  P{2)  has  less  poles  than  in  the  Butterworth  case  we  must  find 
a  function  Fs{z)/Fd{z)  where  Foiz)  is  of  length  4(fc  —  p)  +  1  for  some  0  <  p  <  /;  and  Fs{z) 
is  of  minimaJ  degree  such  that: 


is  valid.  In  the  Daubechies  case  we  fixed  Fd{~)  =  1  and  found  the  minimal  degree  Ff/{z), 
and  in  the  Butterworth  we  fixed  Fs{:)  =  1  and  found  the  minimal  degree  Fd{z).  For  the 
intermediate  cases  we  fix  the  length  of  Fd{z)  as  4(k  —  p)  + 1  for  some  0  <  p  <  it  ana  then  find 
the  minimal  degree  Fn(z).  For  a  given  binomial  factor  (1  +  +  z)^  the  total  number 

of  poles  ajid  zeros  of  FIz]  will  not  necessarily  be  the  same  for  tne  Daubechies,  intermediate 
and  Butterworth  solutions,  although,  in  fact,  it  will  never  vary  by  more  than  two. 

Note  that  Foiz)  is  the  WSS  component  of  (1 +  +  z)^F)v(z),  but  is  to  be  of  lower 

degree  than  the  WSS  component  of  (1  +  +  2)^.  Thus  it  is  apparent  that  some  of  the 

terms  of  (1  +  +  z)^Fp;{z)  must  be  zero,  and  the  WSS  component  must  not  contain 

the  endterms.  This  last  condition  implies  that  we  must  have  that  (1  +  2"*)^(1  +  z)^Fn{z) 
is  of  length  4k  +  3;  since  otherwise,  if  it  is  of  length  4/:  +  1,  the  WSS  component  is  also  of 
length  4X  +  1,  and  contains  the  endterms.  It  is  convenient  to  treat  separately  the  two  cases 
for  N  even  and  odd. 

N  =  2k  +  1  odd:  The  length  of  the  denominator  is  4(fc  —  p)  +  1.  If  we  try  of  length 

4p+  1  then  the  length  of  the  numerator,  (1  +  r”*)^{l  +  z)^Ff\f{z),  is  4(/:  +  p)  +  3,  and 
that  of  its  WSS  component  4{k  +  p)  +  1.  The  difference  between  the  length  of  the  WSS 
component  of  (1  +  +  z)^ Fs{z),  and  the  length  of  Fd{z)  is  hence  4  •  2p.  Since  the 

WSS  component  of  the  numerator  is  symmetric,  and  a  functions  of  z*,  setting  one  pair  of 
its  endterms  to  zero  in  fact  decreases  its  length  by  4.  If  this  can  be  done  2p  times  then  the 
WSS  component  of  the  numerator,  and  the  denominator  will  be  of  the  same  length.  Note 
that  2p  is  also  the  number  of  independent  elements  in  Fn{z).  In  fact  the  solution  is  found 
by  solving  a  2p  x  2p  system  of  linear  equations. 

N  =:  2k  even:  The  lenrth  of  the  denominator  again  is  4{k  —  p)  +  1.  Now  if  we  try  FV(2) 
of  length  4(p  —  1)  +  3,  the  length  of  the  numerator  is  4{k  +  p  —  1)  +  3,  and  that  of  its  WSS 
component  4(^:  +  p  —  1)  +  1.  The  difference  between  the  length  of  the  WSS  component  of 
the  numerator,  and  the  length  of  Fd{z)  is  hence  4  •  (2p  —  1).  Again  2p  —  1  is  the  number 
of  independent  elements  in  Fs{z).  In  this  case  the  solution  is  found  by  solving  a  set  of 
(2p  —  1)  X  (2p  —  1)  linear  equations. 

Example  4.2  N  =  7.  Note  that  N  =  2k  +  I  where  k  =  3.  There  are  thus  two  intermediate 
solutions  for  p  =  1,2.  Taking  the  p  =  1  case  first,  note  that  Fn{z)  is  of  length  5,  and  we 
wish  to  set  2p  =  2  pairs  of  endterms  of  the  WSS  component  of  the  numerator  to  zero.  The 
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situation  is  illustrated  below. 
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We  have  used  “d”  to  indicate  elements  of  the  HSS  component  of  the  numerator,  and  “x”  for 
elements  of  the  WSS  component.  Clearly  if  the  indicated  endlcrms  of  the  M^S5  component 
equal  zero,  then  the  denominator  will  be  of  length  9.  Jt  is  easily  seen,  that  the  conditions  to 
set  the  endterms  to  zero  are 


14  + a  =  0 
364  +  92a  +  14i>  =  0, 

=>  (a,  fc)  =  f— 14,66).  Thus  Fn{z)  =  (r  -  12  +  c“’).  The  wavelet,  scaling  function  and 
spectra  are  snown  in  Figure  S. 

The  second  intermediate  solution  is  forp  =  2,  so  that  Ffj{z)  is  of  length  9,  and  we  want  the 
WSS  component  of  the  numerator  to  have  2p  =  4  pairs  of  endterms  set  to  zero. 
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The  conditions  to  set  the  indicated  endterms  to  zero  now  are: 

I 

14  +  0  =  0 
364  +  910  +  146+ c  =  0 
2002  +  1001  o +  3646  + 92c  +  14(f  =  0 
3432  +  3004a  +  20166  +  1092c  +  364d  =  0. 

The  solution  is  {a,b,c,d)  =  (-14,592/7,-274,3218/7).  The  wavelet,  scaling  function  and 
spectra  are  shown  in  Figure  4- 
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Note:  The  denominator  is  of  length  4(fc  —  p)  +  1  in  these  intermediate  solutions,  with 
0  <  p  <  ik.  For  p  =  0  we  would  get  the  Butterworth  solution,  and  for  p  =  ^  the  Daubechies’. 
For  Jk  =  0, 1,  that  is  =  1, 2, 3,  there  are  obviously  no  intermediate  solutions. 


4.3  Tabulating  the  P{z)  functions 

A  point  that  we  would  wish  to  emphasize  is  that  in  all  of  the  design  techniques  discussed 
above  it  was  the  construction  oi  P{z)  that  was  central.  This  was  the  case  for  the  Daubechies’ 
designs  of  [3],  and  the  Butterworth  and  intermediate  designs  of  sections  4.1  and  4.2.  Once 
P{z)  is  determined  the  m^nitude  spectrum  of  Hoiz)  is  fixed  irrespective  of  the  allpass 
factors  Ao{z)  and  Ai{z)  of  Theorem  3.2,  and  the  factorization  chosen. 

If  we  desire  filters  that  are  maximally  fiat,  or  equivalently,  wavelets  that  have  a  maximum 
number  of  vanishing  moments  then  we  design  a  P{z)  with  the  maximum  number  of  zeros 
at  z  =  —1.  Those  minimum  degree  P{zys  with  this  property  are  easily  listed,  and  this  h<is 
been  done  in  Table  1  for  the  C2ises  N  =  1,2,  •••7.  The  table  exhausts  the  minimal  degree 
maximally  flat  P{z)  autocorrelation  functions  for  these  orders.  A  crude  estimate  of  the 
regularity  of  the  wavelets  associated  with  the  each  function  is  given. 

For  comparison  purposes  the  graph  of  the  N  =  7  Daubechies  wavelet  and  scaling  function 
are  given  in  Figure  5. 

4.4  Irrational  Factorizations 

Theorem  3.2  demonstrates  how  to  calculate  all  valid  rational  autocorrelation  functions.  For 
implementation  reasons  we  have  been  interested  only  in  orthogonal  rational  factorizations.  It 
is  nonetheless  possible  to  take  an  irrational  factorization  of  a  rational  P{z)  function  and  use  it 
to  derive  an  orthonormal  wavelet  basis.  For  example  if  we  take  P{z)  =  Ho{z)Ho{z''^)  where 

Ho{z)  =  yj P{z)  we  end  up  with  linear  phase  filters.  That  Ho{z)  is  necessarily  irrational, 

where  one  of  the  P{z)  functions  designed  in  this  section  is  used,  is  guaranteed  by  Lemma 
5.1  below.  For  example  if  we  use  the  Butterworth  N  =  7  case,  as  in  example  4.1,  we  get  the 
wavelet  and  scaling  function  shown  in  Figure  6.  The  magnitude  spectra  plots  are  of  course 
identical  to  those  in  Figure  2,  since  these  are  independent  of  the  factorization  chosen.  It  is 
worth  pointing  out  that  the  wavelet  is  very  similar  to  an  orthonormaJ  wavelet  constructed 
by  Meyer,  but  based  on  irrational  filters  [5J. 

Clearly  Theorem  3.2  generates  all  ortnogonal  filter  banks  where  P{z)  is  rational,  even  if 
the  filters  themselves  are  not  so. 


5  Linear  phase  orthogonal  HR  solutions 

In  [3,  31,  4]  it  was  pointed  out  that  it  is  not  possible  to  generate  a  nontrivial  basis  of  real 
finite  length  wavelets  which  are  orthonormal  and  symmetric.  In  fact  the  only  solution  is  the 
Hauar  basis,  which  is  not  continuous.  If  we  were  prepared  to  consider  complex  FIR  filters  it 
would  be  possible  [36],  but  filters  with  complex  coefficients  ase  not  generally  of  interest. 

We  have  not  thus  far  addressed  the  possibility  of  achieving  linear  phase  with  orthogonal 
rational  IIR  filters.  We  first  consider  the  possibility  that  one  of  the  maximally  flat  P{z) 
functions  already  derived  might  factor  P[z)  =  H{z)H{z~^),  where  H{z)  is  a  rational  linear 
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phase  filter.  The  next  lemma  proves  that  this  is  never  possible  for  the  Daubechies,  interme¬ 
diate  or  Butterworth  P(z)  functions  of  any  order.  In  other  words  if  we  desire  linear  phase 
filters  the  solutions  presented  so  far  will  not  serve. 

After  considering  once  more  the  structure  of  orthogonal  HR  solutions  however,  we  see 
how  the  linear  phase  condition  can  be  structurally  imposed,  and  use  this  to  generate  designs. 
While  the  filters  never  have  as  many  zeros  at  z  =  —  1  as  those  of  section  4,  they  give  wavelets 
that  are  very  Smooth.  This  result  was  presented  in  preliminary  form  in  [15,  16). 


5.1  Structure  of  linear  phase  orthogonal  solutions 

We  first  show  that  none  of  the  particular  orthogonal  solutions  presented  so  far  can  be  used 
if  rational  filters  are  required. 

Lemma  5.1  The  Daubechies,  intermediate  and  Butterworth  solutions  to  the  equation: 

P(z)  +  P(-r)  =  2, 

can  never  be  factored  P{z)  =  H{z)H{z~^)  where  H[z)  is  a  rational  linear  phase  filter. 


The  proof  is  in  appendix  A.2. 

The  above  result  is  not  unexpected:  all  of  these  designs  were  found  by  merely  ensuring 
that  Lemma  2.2(c)  was  satisfied.  If  we  wish  in  addition  to  guarantee  linear  phase  we  shall 
have  to  impose  this  structurally  before  we  begin  the  design.  We  find  it  more  convenient  to 
work  with  the  equivalent  condition  Lemma  2.2(b).  We  first  recall  an  important  preliminary 
result  on  the  structure  of  orthogonal  polyphase  matrices  [3,  26), 


Lemma  5.2  An  orthogonal  polyphase  matrix  is  necessarily  of  the  form: 


(24) 


where: 

A'<io(--)ffoo(*--')  +  =  1  =  A,(z)Ap(r-‘),  (25) 

and  Ap(2)  =  detHp(r)  is  on  allpass  function. 

Proof:  Lemma  2.2(b)  gives  immediately: 


■  Woo(---') 

//io(r-‘)' 

1 

■  Pll(2)  -Poi(z)' 

-ffio(^)  //oo(z) 

which  leads  to: 

Pn{:)  =  Poo{z-^)A,{z)  =  Hoo{z-')/A^{z-^)  • 

from  which  follows  that  Ap(z~^)  =  [Ap(r)]~^  that  is,  Ap(z)  is  an  allpass  filter  [26).  Also: 


i/,o(z)  =  -.tfoi(^-')Ap(2).D 

We  have  seen  before  that  linear  phase  filters  are  of  two  types,  those  that  have  half  sample 
symmetry  or  antisymmetry  (HSS  or  HSA)  and  those  that  have  whole  sample  symmetry  or 
antisymmetry  (WSS  or  WSA);  again  we  find  it  convenient  to  treat  them  separately. 
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5.2  Half  sample  symmetric  case 

If  linear  phase  filters  are  half  sample  symmetric  or  antisymmetric  then  the  polyphase  com¬ 
ponents  are  related  as  in  fact  A.l.  We  can  use  this  to  force  linear  phase  on  the  polyphase 
filter  matrix  (24). 


Lemma  5.3  in  an  orthogonal  fitter  bank,  where  the  filters  are  half  sample  symmetric,  it  is 
necessary  and  sufficient  that  the  polyphase  matrix  be  of  the  form: 


H,(z) 


A{2)  Z“'y4(z“') 

-z‘-”A(z)  z-">l(z-*)  ’ 


(26) 


where  A{2)A{z'^)  =  1. 


Proof:  One  of  the  filters  must  be  HSS  while  the  other  is  HSA,  since  these  always  have  at 
least  one  zero  each  at  z  =  —  1  and  z  =  1  respectively,  and,  because  of  (25),  the  filters  must 
have  no  zeros  in  common. 

Hence  if  Hq{z)  =  Hoo{z^)  +  z"'//oi(-’)  is  HSS  then  Hooi^)  =  z‘Hoi{z~^)  for  some  /. 
Similarly  Hio{z)  =  — z"‘/fii(z"*)  for  some  m.  The  HSS  polyphase  matrix  is 


Hp(z) 


Hoo{z)  z-'Hoo(r-‘) 
Hioiz)  -z“"’//,o(^”*) 


(27) 


On  equating  (24)  and  (27)  we  get  Ho\{z)  =  z~‘Hoo{z),  H’io(r)  =  —Hoo(z)Ap{z)z‘ ,  and: 

-z-”'HUz~')  =  z-’"-'//oo(*-‘)Ap(2‘')  =  i/oo(r-')A,(z). 

Now  the  fact  that  Ap(z)  =  1/Ap(r~^)  gives  Ap(z)  =  z~”'~\  so  that  Ap(r)  is  a  delay  z~", 
and  2n  =  m  -f  /.  This  is  the  desired  result.  □ 


For  example  choosing  /  =  n  =  0,  we  get: 

//„(.-)  =  /!(.-') +  =->/l(2-’)  (28) 

W,(z)  =  -/!(.-’) +  (29) 


In  order  to  force  some  regularity  we  might  wish  to  design  Hq{z)  to  have  again  the 
maximum  possible  number  of  zeros  at  z  =  —  1.  This  can  be  done  by  solving  a  fairly  simple 
set  of  nonlinear  equations.  Taking  the  filters  in  (28)  and  (29)  and  the  simple  allpass  section: 


A{z) 


1  +  az~^  4-  6z~^ 

6  -f  az~^  4-  z~^ 


I 


with  0  =  6,  b  =  15/7  we  get  that  //o(-)  contains  five  zeros  at  z  =  —1,  has  a  reasonable 
lowpass  response  and  gives  a  wavelet  that  is  very  smooth.  The  wavelet  and  its  spectrum  are 
shown  in  Figure  7. 
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5.3  Whole  sample  symmetric  case 

Next  suppose  Ho{z)  is  to  be  whole  sample  symmetric  (WSS).  In  this  case  one  of  the  polyphase 
components  must  oe  half  sample  symmetric,  the  other  whole  sample  symmetric,  and  both 
must  be  either  symmetric  or  antisymmetric.  Since  antisymmetric  filters  always  have  a  zero 
at  2  =  1  the  latter  case  can  never  satisfy  (25). 

It  is  also  implied  by  (25)  that  the  denominators  of  Hoo{^)  &nd  Hoi{z)  are  equal,  so  we 
must  solve: 

Aroo(i)Aro„(2-‘)  +  =  D(,z)D(z-'), 

where  Noo{z)  and  No\{z)  are  the  numerators  and  D(z)  is  the  common  denominator. 

Since  a  rational  IIR  filter  is  symmetric  if  and  only  if  both  numerator  and  denominator 
are,  we  need  consider  only  the  symmetry  of  Noo{z),  Noi{z)  and  D{z).  There  are  four  cases 
that  give  that  Boo{z)  and  Hoi{z)  have  the  whole/half  sample  symmetries  described  above. 
One  can  verify  that  these  are  tnat  D{z),  Noo(-)  and  Nox{z)  are  all  symmetric  and  have 
lengths  that  are  respectively  (odd,  odd,  even),  (odd,  even,  odd),  (even,  even,  odd)  and 
(even,  odd,  even).  The  last  two  cases,  where  Diz)  has  even  length,  are  immediately  ruled 
out,  since  a  symmetric  even  length  FIR  function  implies  at  least  one  zero  on  the  unit  circle. 

For  example  for  the  (odd,  odd,  even)  case  Hoo{z)  has  whole  sample  symmetry,  Hoi{z) 
has  half  sample  symmetry,  and  the  polyphase  matrix  is  lossless  and  gives  filters  that  have 
whole  sample  symmetry. 

Finding  good  solutions  is  not  as  easy  as  in  ths  HSS  case,  since  the  method  is  not  construc¬ 
tive.  However  examples  can  be  constructed  by  solving  a  set  of  nonlinear  equations.  Consider 
the  small  example:  Noo{z)  —  0  +  62“*  +ar~*,  No\{z)  =  c+C2”\  and  D{z)  =s  a-|-<i2“'  +02”’. 
The  values  (fl,6,c,<f)  =  ((5  +  4n/2)/14,1,(I2  +  4v^)/14,(21  +  24^2  +  16  •  23/2)/49)  gives 
a  solution  such  that  the  lowpass  filter  Ho(:)  has  two  zeros  at  2  =  — l.An  estimate  of  it’s 
regularity  gives  r  >  0.5. 


6  Orthogonalization  of  wavelet  bases 

One  of  the  interesting  wavelet  bases  is  that  derived  by  Battle  and  Lemarie  [6,  7],  which  has 
the  property  of  being  a  basis  for  the  spline  function  spaces. 

The  B-spline  functions  obviously  form  a  basis  for  this  space,  but  are  not  orthogonal  with 
respect  to  integer  shift.c;  in  the  language  of  section  2.1.1  we  have  a  basis  for  Vq,  but  not  an 
orthogonal  one.  The  condition  for  orthogonality  can  also  be  written  in  the  Fourier  domain 
using  the  Poisson  summation  [20,  25): 


<  ^(z),<^(i  -  n)  >  =  £,  |«>(u;  +  2x^)P  =  1.  (30) 

t=.— 00 

Now  assume  that  we  have  a  non-orthogonal  basis  for  a  multiresolution  analysis,  given  by  a 
function  g{x)  and  it’s  integer  translates.  Then  it  is  easy  to  see  one  way  that  the  orthogonal¬ 
ization  of  the  non-orthogonal  baisis  g{x  —  n)  may  be  performed  in  the  Fourier  domain: 


$(tu) 


G(w) 

\/ES,-»|G(u.  +  2>r  <:)!’■ 


(31) 


Clearly  $(u;)  satisfies  the  Fourier  domain  orthogonality  condition  (30),  and  the  rest  of  the 
multiresolution  analysis  machinery  follows;  this  is  precisely  the  procedure  followed  by  Battle 
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and  Lem&rie  [6,  7].  The  sequence  ho(n)  associated  with  the  two  scale  difference  equation  (2) 
for  <i>{x)  is  not  given  by  a  rational  function  however.  Hence  the  filter  bank  implementation  is 
not  realizable,  by  which  we  mean  that  there  is  no  finite  complexity  recursive  implementation 
of  the  filters.  Often  for  such  non-realizable  filters  a  truncated  version  of  the  infinite  impulse 
response  is  taken,  so  that  an  approximate  FIR  implementation  is  used;  see  for  example  (1). 


6.1  Orthogonalizing  continuous-time  bases  with  recursive  filters 

Of  course  there  are  many  different  orthonormal  bases  that  span  the  same  space;  the  ones 
derived  by  Battle  and  Lemarie  are  by  no  means  the  only  ones  for  the  spline  spaces.  We  next 
show  that  if  there  is  a  compactly  supported  wavelet  basis  for  Vo,  then  it  is  always  possible 
to  find  an  orthonormal  basis,  which  is  infinitely  supported,  but  for  which  the  filters  involved 
are  rational  and  thus  realizable.  As  a  special  case  we  shall  construct  realizable  bases  for  the 
spline  spaces,  which  are  alternatives  to  those  of  Battle  and  Lemarie. 

Theorem  6.1  If  the  set  {^(x  —  lc),k  c  Z)  forms  a  non-orthogonal  basis  for  Vq,  obeys  a 
two-scale  difference  equation,  and  g{x)  is  compactly  supported,  then  it  is  always  possible  to 
find  an  orthonormal  basis  {^(i  —  k),k  t  Z),  where: 

4>(u.)  =  n  (32) 

1=1 

and  where  Ho{e^'^)  is  a  rational  function  of  e^'“. 

Proof:  The  proof  is  constructive.  The  normalizing  function  used  in  (31)  (i.e.  the  denom¬ 
inator  of  the  right  hand  side)  is  27r-periodic,  and  can  be  written  as  a  discrete-time  Fourier 
transform: 

f;  \GiwA-2rk)f  =  ^c„e-’'^  =  C{en.  (33) 

ks—oo  *» 

It  can  be  shown  (20,  25]  that  the  Fourier  coefficients  are  obtained  from: 


Cn=  f  9{x)9{x  -  n)dx. 
J  — oo 


(34) 


Since  g{x)  is  compactly  supported,  it  is  obvious  that  only  finitely  many  of  the  c„  are  non¬ 
zero.  Equally,  since  the  Cn  are  the  Fourier  coefficients  of  a  positive  real  function,  it  is  clear 
that  we  can  always  factor: 

C{z)  =  Ei2)Eiz-^).  (35) 


Note  that  C{z)  cannot  have  zeros  on  the  unit  circle,  because  of  the  fact  that  g(x  —  n)’s  form 
a  basis  [20]. 

The  choice 


<I>(u;) 


E{e’'-y 


(36) 


clearly  satisfies  (30);  so  that  the  </>(x  —  k)  are  orthogonal.  Since  E{e^'")  is  27r-periodic  we  get; 


k 
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where  F(e^'")  =  1/E(e^"').  That  is  ^(x)  is  a  linear  combination  of  shifted  versions  of  g(x)\ 
hence  the  span  of  {<f>{x  —  k)k  e  Z)  is  also  Vo. 

So  we  now  have  that  both  the  sets  {g{x  ^  k),k  c  Z)  and  {^(x  —  k),k  t  Z]  form  bases 
for  Vo.  But  g{x)  obeys  the  two  scale  difference  equation  (2);  so  for  some  /(n): 

S(»)  =  2‘"-  f;  /(n)-s(2i-n)  =»■  C(w)  =  (37) 


However,  since  the  two  sets  span  the  same  space,  we  can  always  write  the  function  <f>{x)  as 
a  linear  combination  of  the  functions  ^(x  —  k): 


<l>ix)  =  2^^^-  53  Q('’) -^(2;  -  n), 

n»— ^00 


(38) 


SO  that  by  substituting  in  the  expression  for  g{x)  from  (37)  we  get  that  for  some  sequence 
ho{n): 

^(x)  =  2‘/^-  f;  hoin)-<f>{2x-n)  =►  4»(u7)  =  .ffo(c^“'^*)  •  ^(^2).  (39) 

n=— 00 

Thus  ^(x)  satisfies  a  two  scale  difference  equation  also. 

Substituting  (36)  into  the  Fourier  version  of  (39)  we  get; 

Gjw)  _  •  G(u>/2) 

£(c^“')  ~  £(c->“’/*) 


Comparing  this  with  (37)  gives  the  relation: 


•  £(c>"') 
£(e^2“') 


(40) 


Note  that  is  an  FIR  function;  since  when  it  is  iterated  in  (37)  it  gives  ^(x),  which  is 

compactly  supported.  Equally  E{e^'")  is  FIR,  since  it  is  one  of  the  factors  of  C(e^“').  Hence 
.^fo(e-’’")  is  a  rational  function  of  and  corresponds  to  a  filter  that  can  be  implemented 
recursively.  □ 

Since  ti>{x)  gives  an  orthogonal  basis  for  V©  we  see  from  section  2.1.1  that  ho{n)  and 
^i(”)»  (given  by  /fi(x)  =  Hq{—z~^)),  satisfy  (5)  and  (4).  In  other  words  the  conditions 

of  Lemma  2.2(d)  hold  and  we  have  an  orthogonal  filter  bank,  with  rational  filters,  that 
generates  our  basis  for  Vq.  It  follows  from  theorem  3.2  that  the  function  , 

P(  )  L{z)Liz-^)  ■  E{z)E{z-^) 

^  ’  E{2^)E{z-^) 


is  valid. 
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Note  that  we  do  not  have  to  separately  consider  the  convergence  of  the  infinite  product 
implied  in  (39).  Because  of  (40)  successive  numerators  and  denominators  of  the  product 
cancel: 


^w)  =  n  HoM2‘) 

I  •=! 


nx(u,/2‘)-n 


G(u,) 


£:(e^“'/^) 

£(c^“') 


£(cJw/2-) 

£(c^2-/2') 

£(c>“'/^) 
■  ^(c^wa) 


•  •  •  =  G(tx?)  • 


£(e'")' 


(41) 


So  the  infinite  product  for  Ho{e^'“)  converges  since  that  for  L{e^'“)  does.  This  also  means 
that  we  do  not  have  to  separately  make  regularity  estimates  for  $(uj)  if  the  regularity  of 
G{w)  is  known,  since  <^(x)  is  a  linear  combination  of  integer  shifts  of  the  function  ^(ar),  and 
thus  h<is  the  same  regularity. 


6.2  Bases  for  the  spline  spaces  using  recursive  filter  banks 

An  application  of  the  above  result  is  to  find  bases  for  the  spline  spaces.  First  note  that  the 
N-ih  order  B-spline  function,  which  is  defined  by:  p(i)  =  s(i)  *  s(i)---s(x),  where  there 
are  N  convolutions,  and  s(x)  is  the  characteristic  function  of  the  interval  (0, 1),  is  compactly 
supported.  Further  the  set  •(p(x  —  k),k  c  Z)  is  a  basis  for  the  N-th  order  spline  function 
space.  To  get  an  orthogonal  oasis  from  this  we  apply  theorem  6.1. 

Note  that  the  Fourier  transform  of  the  B-spline  ^^(i)  can  be  written  (20): 

GW =n(i 

ISl 


In  other  words  =  (1  +  . 

The  coefficients  of  E{z)E(z~^)  are  found  from  (34),  that  is  by  evaluating  <  g{x)^g[x  —  n)  > 
(20,  37).  Those  of  E{z)  are  then  obtained  by  spectral  factorization  (35).  So  we  end  up  with: 

i^o(.-)  =  (l +  (42) 

Successive  terms  in  the  infinite  product  cancel,  as  in  (41),  and  we  get 


<I>(u;)  =  C(c-'’")  • 


E[e^'^)' 


Hence: 

*=-«>  ' 

where  F(e^'")  ~  1/ E{e^'^)  is  an  all-pole  filter,  that  is  ^(x)  is  a  linear  combination  of  splines. 

Finding  polynomial  solutions  E{z)  such  that  Ho[z)  in  (42)  gives  an  orthogonal  filter 
bank  was  also  done  by  Stromberg  (38).  This  solution  was  also  noted  by  Unser  and  Aldroubi 

That  the  wavelet  and  scaling  function  are  indeed  splines  is  most  easily  seen  for  the  N  =  2 
case  where  they  are  piecewise  linear.  The  wavelet  and  scaling  function  are  shown  in  Figure 
8. 
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The  relation  between  the  wavelet  basis  proposed  here,  and  those  of  Battle  and  Lemarie 
is  readily  seen  if  we  consider  the  associated  function  P{z): 


P{z)  = 


+  •  E(z)Eiz-^) 


Eiz^)E{z-^) 


(43) 


t  _ 

Observe  that  if  we  factor  it  as  P{z)  =  \JP{z)  •  yJP{z)  and  use  Ho{z)  =  \JP{z)  in  (32)  the 

cancellation  property  between  successive  numerators  and  denominators  still  holds,  and  we 
end  up  with: 


=  G{w)  • 


£(c^“')E(c--»“')’ 


which  is  the  same  as  the  form  in  (31)  when  E(e^°)  =  1. 

In  words:  the  different  orthonormal  bases  here  correspond  to  different  factorizations  of 
P(z)-  Note  however  that  it  is  not  in  general  true  that  different  orthogonal  fsictorizations  of 
P(z)  give  rise  to  wavelet  bases  that  span  the  same  space.  For  example  the  choice  Eo(z)  = 
(1  +  z~^)^ E{z)l E{z~’^'\  gives  an  orthogonal  basis,  but  we  do  not  get  the  cancellations  in  the 
infinite  product,  and  tne  wavelets  do  not  span  the  spline  spaces. 

It  is  clear  that  the  filters  that  generate  the  Battle-Lemarie  wavelets  have  linear  phase; 
however  they  are  not  rational  for  any  order,  as  is  proved  by  the  next  lemma. 


Lemma  6.2  Tht  Spline  solutions  to  the  equation 


P{z)  +  P{-z)  =  2, 


can  never  be  factored  P{z)  =  H{z)H{z~^)  where  H{z)  is  a  rational  linear  phase  stable  filter. 

The  proof  is  in  Appendix  A. 2.  Note  that  in  this  case  unstable  solutions  are  possible,  t.e. 
where  H{z)  has  poles  on  the  unit  circle,  whereas  no  solutions  at  all  were  possible  for  the 
cases  covered  in  Lemma  5.1. 


7  Conclusion 

We  have  exiimined  in  detail  the  structure  of  orthogonal  two  chEuinel  filter  banks,  and  their 
relation  with  orthonormal  baises  of  wavelets.  We  placed  p^lrticular  stress  on  filters  that  have 
a  maximum  number  of  zeros  at  r;  since  these  maxim^ly  flat  filters  give  rise  to  wavelets 
that  have  a  large  number  of  disappearing  moments  and  are  very  smooth.  The  Daubechies, 
Butterworth  and  intermediate  solutions  were  of  this  form.  The  filters  that  were  used  to 
realize  bases  for  the  spline  spaces  also  had  a  large,  but  not  maximum,  number  of  zeros  at  v. 

It  should  also  be  pointed  out  that  while  in  this  paper  we  have  been  interested  exclusively 
with  orthogonal  filter  banks  it  is  of  course  possible  to  factor  any  of  the  P{z)  functions  we 
lave  presented  in  a  non-orthogonal  fashion.  This  was  essentially  the  procedure  followed  in 
31,  4J,  where  linear  phase  factorizations  of  the  Daubechies’  P{z)  were  taken.  As  noted  in 
40,  4]  however  it  can  be  difficult  in  the  FIR  case  to  get  filters  with  flat  spectra  when  linear 
phase  is  desired.  We  observe  that  the  problem  becomes  even  worse  when  HR  filters  are 
involved.  In  other  words  it  is  very  difficult  to  factor  any  of  the  HR  P(2)’s  listed  in  tables  1 
and  2  to  obtain  linear  phase  rational  filters  which  still  have  acceptable  response.  Of  course 
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it  is  always  possible,  as  we  saw  for  the  Butterwortb  case  in  section  4.4  and  Battle-Lemarie 
case  in  section  6.2,  to  factor  any  of  these  Pi^Ys  in  a  linear  phase  orthogonal  fashion,  but 
where  the  filters  involved  are  irrational. 

An  important  consideration  that  is  often  encountered  in  the  design  of  wavelets,  or  of  the 
filter  banks  that  generate  them,  is  the  necessity  of  satisfying  competing  design  constraints. 
This  makes  it  necessary  to  clearly  understand  whether  desired  properties  are  mutually  ex¬ 
clusive.  For  example  in  designing  nontrivial  linear  phase  wavelets  it  is  found  necessary  to 
abandon  orthogonality  [3,  4],  or  to  use  filters  with  complex  rather  than  real  coefficients  |36], 
or  to  abandon  compact  support  and  use  rational  filters  (section  5  above)  or  irrational  fil¬ 
ters  [6,  7,  5].  Table  3  attempts  to  clarify  some  of  the  conflicts  by  tabulating  which  of  the 
properties  orthogonality,  linear  phase,  FIR,  real  coefficients  and  rational  transfer  function 
are  simultaneously  attainable  and  commenting  on  the  solutions. 


A  Appendix 


A.l  Filters  with  symmetry 

Fact  A.l  For  symmetric  discrete  sequence  R{z)  =  +  the  following  relations 

between  the  polyphase  components  hold: 

(i)  R{z)  1^55;  Ro{z-^)  =  Roiz),  ^^Rii^-^)  =  Rii^Y 

(ii)  R{z)  HSS:  =  Ro{zY 

(Hi)  R{z)  WSA:  Rc{z-^)  =  -/2o(r), 

(iv)  R{z)  H^55.-  Ri{z-^)  =  -RcizY 

Proof:  (i):  WSS  =>  R{z)  —  R{z~^).  So:  Ro{z^)-hz~'^Ri{z^)  =  Ro{2~^)  +  zRi{z-'^).  Equating 
even  and  odd  powers  of  z~^  we  find:  Ro(z~^)  =  Ro(z),  z^Rj(z~^)  =  Ri(z). 

(ii):  HSS  =►  R{z)  =  --'R{z-'^).  So  Ro{z^)  -I-  z-'^R,{z^)  =  z-'^Ro{z-^)  +  Ri{z-'^).  Equating 
even  and  odd  powers  of  z  gives  =  Ro{z). 

The  other  properties  follow  by  similar  analysis.  □ 


It  follows  immediately  that  an  HSS  filter  always  has  a  zero  at  z  =  —  1,  and  a  HSA  filter 
always  has  one  at  z  =  1. 

Fact  A. 2  For  a  rational  IlR  filter  that  has  linear  phase,  let  Ni  be  the  length  of  the  numera¬ 
tor,  and  N2  the  length  of  the  denominator,  then  if  Ni  —  N2  is  odd  the  filter  is  WSS  or  WSA, 
and  if  Ni  —  Aj  is  even  it  is  HSS  or  HSA. 

Proof:  If  H{z)  =  N{z)ID{z)  then: 


H{e^'^) 


A(e>“') 

Z>(c-»“') 


I 


where  and  are  the  phases  of  the  numerator  and  denominator  respectively. 

Clearly  H{z)  will  have  linear  phase  if  and  only  if  both  numerator  and  denominator  do. 

Since  A^(z)  and  D{z)  are  linear  phase  FIR  functions  we  have:  A(z)  =  z^N{z~^)  where 
/  is  even  if  the  is  odd,  and  I  is  odd  if  Aj  is  even.  Also  D{z)  =  z'"Z)(z“'),  with  similar 
constraints  for  rn.  Hence 


H{z) 


N{z-^) 

D{z)  D{z-^y 
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Now  /  —  m  is  even  if  N\  and  TVj  are  both  even  or  both  odd  (t.e.  Nx  —  is  even),  and  is 
odd  otherwise  (i.c.  Nx  —  is  odd).  Using  fact  A.l  /  —  m  even  implies  that  H{2)  is  WSS  or 
WSA,  and  /  —  m  odd  implies  that  it’s  HSS  o*HSA.  □ 


A. 2  Proof  of  Lemmas  5.1  and  6.2 

I 

Proof:  If  H{z)  is  linear  phase  then: 

_  >-^C(r)C(z-«) 

for  some  integer  delay  and; 


P(r)  =  //(z)//(.--‘)  = 


'  C{z)C{z-^) 
P[z-^)D{z-^) 


1 


Hence  every  pole  and  every  zero  must  be  double. 

Butterworth  case:  In  the  Butterworth  case  P{z)  can  be  written: 

p,,  _  +  _ (1  + 

(j->  +  2  +  i)"  +  (-£->  +  2  -  z)"  (l  +  r-')“  +  (l-r-')’''(-l)''' 

Note  that  the  denominator,  W{z),  is  a  polyphase  component  of  +  following  Lemma 

3.1.  If  all  poles  of  P{z)  are  to  double  we  must  have: 


H^(2c)  =  0 


dW{z) 

d{z-^) 


^^1  =  2^^  •  (1  +  -  2N  ■  (-\f  ■  (1  - 

is  a  polyphase  component  of  (1  +  =  Bo{z^)  +  z“’B, (z’).  So  the  polyphase  compo¬ 

nents  of  two  successive  binomials  must  share  a  zero.  Consider: 

(1  +  .--’)’"  =  (1 +  + 

=  (Bo(r’)  +  +  .--'(^0(2’)  + 


If  the  polyphase  component  of  (1  -f  z”’)^^,  which  is  {Bo{z^)  -f  z“^Pi(z’)),  and  that  of 
(1  -f  share  a  zero,  then  clearly  Bi(r*)  must  contain  this  zero  also.  This  would 

imply  that  Bo{z)  and  BJz)  are  not  coprime;  this  is  a  contradiction  however,  the  polyphase 
components  of  (1  +  z“')"  are  known  to  be  coprime  for  all  N  [4]. 

Intermediate  cases:  Here  we  shall  make  use  of  fact  A. 2  to  show  that  the  solutions  have  to 
be  half  sample  symmetric  or  antisymmetric  if  they  are  to  have  line^L^  phase,  and  then  show 
that  they  do  not  satisfy  the  form  of  Lemma  5.3. 

(a)  Consider  the  N  =  2k I  c^lse;  since  the  numerator  and  denominator  of  P{z)  have  length 
4{k  -I-  p)  -f  3  and  4{k  —  p)  +  1  respectively,  the  numerator  and  denominator  of  H{z)  should 
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have  lengths  Ni  =  2{k  +  p)  +  2  and  =  2{k  —  p)  +  1.  Hence  =  4p  +  1,  which  is 

odd,  and  H{z)  must  be  HSS  or  HSA  by  fact  A.2.  But  by  Lemma  5.3  for  H{z)  to  be  HSS 
its  polyphase  components  must  be  allpass  filters.  Each  of  the  polyphase  components  have 
numerator  and  denominator  of  lengths  2(A-  +  p)  +  1  and  2{k  -  p)  +  1  respectively;  hence  they 
cannot  be  allp£iss  if  p  ^  0. 

(b)  Consider  N  =  2k:  here  Ni  =  2(A:— p+l)+2  and  =  2(i— p)+l,  so  Ni—N^  =  2(2p— 1)+1 
which  is  again  odd.  So  again  H{z)  must  be  HSS  or  HSA,  and  the  polyphase  components 
must  be  allpasses.  As  before  examing  the  lengths  of  the  numerator  and  denominator  of  Hqo 
and  Hoi  rules  this  out.  The  lengths  are  2{k  +  p  -  1)  +  1  and  2{k  —  p)  +  1  respectively. 
Daubechies  case:  the  filters  are  always  of  even  length,  and  hence  either  HSS  or  HSA  if 
linear  phase.  Hence  their  polyphase  components  must  be  allpasses;  but  since  the  only  FIR 
allpass  is  a  delay  the  only  solutions  are  those  of  length  two.  This  was  already  proved  in  [3). 
□ 


Proof  of  Lemma  6.2  Again  all  poles  and  zeros  of  P(s)  must  be  double  if  the  filters  have 
linear  phase.  Recall  that  in  this  case  we  require  the  P[z)  with  the  form  given  in  (43)  to  be 
valid.  Suppose  indeed  that  every  pole  and  zero  were  double,  then  we  could  write,  for  some 
D{z): 


P{^) 


{D{z^)D{z-^)r 


(44) 


Since  the  numerator  is  a  polyphase  component  of  the  denominator: 


(l+r-‘)’''2''.(D(2)XI(j-‘ ))=  +  (! 


Evaluate  at  2  =  1  to  get; 

2’''P(l)C(l))’=(C(l)0(l))», 

which  is  clearly  a  contradiction  unless  D{i.)  ~  0.  D'l)  =  0  however  implies  poles  on  the 
unit  circle;  for  rational  solutions  to  exist  they  must  be  unstable.  □ 

Note  rational  linear  phase  solutions  which  are  unstable  do  exist  in  the  spline  case.  For 
example  when  N  =  2: 


p,  ,  ^  (I,4,6,4,l)(l,-4,6,-4,l) 

(1,0, -4,0, 6, 0,-4, 0,1) 

•(1,2,1)(1,-2,1)1^ 

(1,0, -2,0,1)  . 

The  denominator  in  this  case  has  double  roots  at  r  =  1  and  2  =  —  1,  as  does  the  numerator. 
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B  Figures 


Figure  1:  Maximally  decimated  two  channel  multirate  filter  bank. 
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(c)  (d) 


Figure  2:  Example  of  Butterworth  orthogonal  wavelet;  here  N  =  7,  and  the  closed  form 
factorization  has  been  used,  (a)  The  wavelet,  (b)  Spectrum  of  the  waVelet.  (c)  Scaling 
function,  (d)  Spectrum  of  the  scaling  function. 
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(<^)  (d) 


Figure  3:  Example  of  intermediate  solution  orthogonal  wavelet;  this  is  the  N  =  7,  p  =  I 
solution,  (a)  The  wavelet,  (b)  Spectrum  of  the  wavelet,  (c)  Scaling  function,  (d)  Spectrum 
of  the  scaling  function. 
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(c)  (d) 

Figure  4;  Example  of  intermediate  solution  orthogonal  wavelet;  this  is  the  N  =  1,  p  =  2 
solution,  (a)  The  wavelet,  (b)  Spectrum  of  the  wavelet,  (c)  Scaling  function,  (d)  Spectrum 
of  the  scaling  function. 
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(b) 


(c) 


(d) 


Figure  5:  Example  of  Daubechies  orthogonal  wavelet;  this  is  the  N  =  7  case,  (a)  The 
wavelet,  (b)  Spectrum  of  the  wavelet,  (c)  Scaling  function,  (d)  Spectrum  of  the  scaling 
function. 
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(a)  (b) 

Figure  6:  Orthogonal  basis  from  irrational  factorization  of  Butterwortb  case  N  =  7.  (a)  The 
wavelet,  (b)  The  scaling  function. 
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Figure  7:  Example  of  linear  phase  orthogonal  wavelet,  (a)  The  wavelet  (antisymmetric),  (b) 
Spectrum  of  the  wavelet,  (c)  The  scaling  function  (symmetric),  (d)  Specfrum  of  the  scaling 
function. 
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avditj 
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(I  -f  f)(l  4*“*)  2~* 


r  »  0 


A^2 

Daubechies 


Butlerwortb 


(1 +x)*(l +  *-')*•  (-1. 4.-1)  *2- 

_ (1.0.6.0.1) _ 


r  >  0.5  —  < 

r  >0.5 


^  *  3 
Daubechies 

Butterworth 


(1  +  x)*(l  4  »-*)*  •  (3.-18. 38. -18.3)  •  **2 

(l4;)^l4x-»)^-^-’ 

_ (6.0.20.0.6) 


r  >  0.9150 

r  >  1.0 


Daubechies 
Intern)  ediaie 

Bullerworlh 


A  =  5 
Daubecliiet 


(1  4i)*(l  4  1-*)*  (-5. 40.-131, 208, -131. 40. -5). 1*2-** 

(l4i)‘*(l4i-*)^(l.-8.1)i-> 

(16U.0.448,0.160) 

(l4r)Vl4i-*)<i-* 

_ (1.0.28.0.70.0.26.0.1) _ 


f  >  1.2750 


r  >  1.497 


r  >  1.5 


(1  4*)*(1  4  1-')*- 

(35.  -350. 1520,  -3650, 5018,-3650, 1520,  -350, 35)  •  i‘2"'* 


f  >  1.5960 


Intermediate 

(142)‘(142-*)‘.(1.-10.34.-10.1) 

(1792.0.4606.0.1792) 

r  >  1,9991 

i 

Butterworth 

(l42)*(l4.*->)*.'-‘' 

(10,0.120.0.252.0.120.0.10) 

r  >  2.0 

A  =  6 
Daubechies 

(14i)*(14*-*)*-i*2-*® 

(-63. 756,  -4067, 12768,  -25374, 32216,  -25374, 12768,  -4067, 756,  -63) 

r  >  1.8880 

Intermediate  I 

(l4r)‘(l4.->)®-(l.-12.1)-z-’ 

(560.0.4928.0.9504.0.4928,0,560) 

r  >  2.5 

Intermediate  II 

(I4j)«(l4r-M*(l.-12.58.2.-126.4.58.2,-I2.1)z 

(147456.0.360446.0,147456) 

T  >  2.476 

Butterworth 

(I42)®(l4i-’)®-— ® 

( 1 .0,66.0.495.0.924.0.495.0.66.0.1 ) 

r  >  2.5 

A  =  7 

(1  4r)'(l  4  1"’)’  •--'■2-**. 

Daubechies 

(231.-3234.  20706.  -79674.  203161.  -356132.  430908, 

r  >  2.158 

-356132. 203161,  -79674,  20706,  -3234.  231) 

Intermediate  I 

(I4r)'(l4.— ’)'-(l,-14.66.-14.1)-.* 
(10752.0.79672,0,10752) 

r  >  2.998 

Intermediate  II 

(l4--)^(l4.-M^-(l.-14.592/7.-274,3218/7,-274.592/7.-14,l)z 

(720696/7.0,1703936/7.0,720896/7) 

r  >  2.998 

Butterworth 

(l4.)7(l4.-l)7,-6 

(14.0.364.0.2002.0.34.32.0.2002.0.364.0.14) 

r  >  3.0 

Table  1:  The  various  P(:)  solutions  for  a  given  number  of  zeros  at  2  =  —1.  Daubechies, 
intermediate  and  Butterworth  solution.s  for  jV  =  1,  -  •  •  7  are  shown. 
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Solution 

P{z) 

Regularity 

N  =  l 

N  =  2 

I 

(\  +  2)^(Uz-^)^-(lA,\)-Z-h} 

(1.0.4.0.1) 

N  =  Z 

iH'H  ii'ii  1 

B 

N  =  A 

(l  +  r)'‘(l+2-M'’-(l,120.1191.2416.1l91.120.1)r-327 
(1.0.120.0.1191.0.2416.0,1191.0.120.0.1) 

B 

N  =  b 

(l-f2)^(l-|-2-M^-(l,502,1460S,S8234,156190,S8234,1460S.502.1).z-‘*2® 

(1,0.502.0.14608,0,88234,0.156190.0.88234.0.14608.0.502.0.1) 

Table  2:  The  various  P{z)  solutions  for  a  given  number  of  zeros  at  r  =  —1.  Spline  solutions 
for  =  1,  •  •  •  5  are  shown. 


Orthog. 

Lin.  phase 

FIR 

Real 

Rational 

Solutions 

1 

1 

1 

1 

1 

Haar  Basis  (1910) 

0 

1 

1 

1 

1 

Biorthogonal  solutions  (31,  4) 

1 

0 

1 

1 

1 

Daubechies  (3),  FIR  paraunitary  lattice  [10] 

1 

1 

1 

0 

1 

Complex  factorizations  [36] 

1 

1 

0 

1 

1 

Linear  Phase  HR  solutions  section  5 

1 

1 

0 

1 

0 

Battle-Lemarie  baises,  Meyer  bases 

1 

0 

0 

1 

1 

Characterized  by  theorem  3.2 

Table  3:  Properties  which  are  simultaneously  achievable  for  two-channel  filter  banks,  and 
comments  on  the  solutions.  A  “1”  in  a  particular  box  indicates  that  the  solution  necessarily 
has  the  corresponding  property. 
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Multiresolution  Broadcast  for  Digital  HDTV  Using  Joint 

Source-Channel  Coding 

K.  Ramchandrant  A.  Ortega  |  K.  M.  Uz|  and  M.  Vetterli^ 
Department  of  Electrical  Engineering 
and  Center  for  Telecommunications  Researcli 
Columbia  University,  New  York,  NY  10027-6699 

December  17,  1991 


Abstract 

The  use  of  multiresolution  joint  source-channel  coding  in  the  context  of  digital 
terrestrial  broadcasting  of  High  Definition  Television  (HDTV)  is  shown  to  be  an 
efficient  alternative  to  traditional  single  resolution  techniques.  While  the  single  res¬ 
olution  schemes  suffer  from  a  sharp  threshold  effect  in  the  fringes  of  the  broadcast 
area,  we  show  how  a  matched  multiresolution  approach  to  both  source  and  channel 
coding  can  provide  a  stepwise  graceful  degradation.  Furthermore,  this  multireso¬ 
lution  approach  improves  the  behavior,  in  terms  of  coverage  and  robustness  of  the 
transmission  scheme,  over  systems  that  are  not  specifically  designed  for  broadcast 
situations.  This  paper  examines  the  alternatives  available  for  multiresolution  trans¬ 
mission,  through  embedded  modulation,  possibly  treUis-coded  to  increase  coverage 
range,  and  error  correction  codes.  We  present  coding  results  and  simulations  of 
noisy  transmission.  From  a  systems  point  a  view,  we  also  discuss  the  trade-offs 
involved  in  the  choice  of  coverage  areas  for  the  low  and  high  resolution,  as  well  as 
the  comparative  costs  and  complexities  of  the  different  multiresolution  transmission 
alternatives. 

'Work  supported  in  part  by  the  New  York  State  Science  and  Technology  Foundation’s  CAT. 

'Work  supported  in  part  by  the  Fulbright  Commission  and  the  Ministry  of  Education  of  Spain. 

'Work  supported  by  the  National  Science  Foundation  under  grants  ECD-88-11111.  K.M.Uz  is  now 
with  David  Sarnoff  Research  Center  in  Princeton,  NJ  08543 

UVork  supported  in  part  by  the  National  Science  Foundation  under  grants  ECD-88-11111  and  MIP- 
90-14189. 
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1  Introduction 

General  discussion  of  the  problem 

Recent  advauices  in  video  compression  techniques  have  spurred  interest  in  the  idea  of  dig¬ 
ital  HDTV.  Even  the  most  demanding  delivery  mechanism,  namely  terrestrial  broadcast, 
might  turn  digitail.  Digital  broadcast  differs  from  digital  point-to-point  transmission  in 
that  different  receivers  have  different  channel  capacities,  i.e.  channel  capacity  decreases 
with  distance  from  the  emitter.  Furthermore,  in  a  digital  environment,  the  transition 
from  reliable  to  unreliable  reception  is  very  abrupt,  creating  the  so-called  threshold  effect. 
Hence,  if  digital  broadcast  is  tackled  as  a  single  resolution  (SR)  problem,  one  would  in 
effect  be  designing  for  the  fringes  of  the  coverage  area,  thus  reducing  the  spectral  effi¬ 
ciency  in  areas  close  to  the  emitter,  as  pointed  out  in  (Ij.  In  light  of  the  current  interest 
in  digital  terrestrial  broadcast  of  HDTV  in  the  U.S.,  the  concern  for  spectral  efficiency 
becomes  even  more  pressing,  especially  given  the  conditions  set  by  the  FCC  in  terms  of 
bandwidth  allocation. 

The  approach  of  designing  for  the  fringes  is  known  from  information  theory  to  be 
suboptimal;  when  dealing  with  different  channels,  one  can  do  better  than  to  transmit 
only  for  the  worst  one,  or  to  perform  “naive”  time  or  frequency  multiplexing  between  the 
different  channels!  Cover  [2]  showed  that  one  could  trade  capacity  from  the  poor  channels 
for  more  capacity  in  the  better  ones,  and  that  the  trade-off  can  in  theory  be  worthwhile. 
These  ideas  point  out  the  efficiency  of  using  a  multiresolution  (MR)  approach  to  digital 
broadcast.  However,  to  the  best  of  the  authors’  knowledge,  no  real  end-to-end  system 
has  been  designed  using  these  results. 

We  approach  this  problem  as  one  of  joint  source  and  channel  coding  in  a  multires- 
olution  (MR)  framework,  extending  our  work  of  [3]  (see  Figure  1  for  an  example  of  a 
two-  resolution  system).  In  the  two-resolution  case,  the  source  is  split  into  “base”  infor¬ 
mation,  the  coarse  channel,  and  “refinement”  information,  the  fine  channel.  As  in  Figure 
1,  the  idea  is  to  match  the  different  resolution  levels  to  different  channel  capacities,  thus 
creating  a  MR  channel  coding  scheme,  so  that  the  receiver  closer  to  the  emitter  can  de¬ 
code  the  full  quality  signal,  while  the  distant  receiver  has  access  to  the  lower  resolution 
quality,  providing  a  stepwise  graceful  degradation.  Furthermore,  we  show  that  the  use 
of  error  concealment  in  the  source  decoder  of  a  MR  system  (see  Figure  1)  improves  the 
robustness  of  the  full  resolution  signal,  thus  increasing  the  coverage  of  “indistinguishable 
quality”  delivery  over  SR  schemes. 

'Note  that,  throughout  tliis  paper,  we  use  “coarse"  synonymously  with  the  lower  resolution  channel 
and  “detail”  or  “fine”  with  the  refinement  or  augmentation  channel  of  the  two-resolution  hierarchy 
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Figure  1:  Block  diagram  of  a  multiresolution  digital  broadcasting  scheme  shown  for  two 
receivers  with  channel  capacities  Cj  and  Cj  with  C:  <  Cj. 


3 


322  AFIT/AFOSR  Wavelets  Workshop 

We  explore  the  available  alternatives  to  an  embedded  transmission  design  and  show 
how  MR  modulation  schemes,  combined  with  trellis  coded  modulation  (TCM)  techniques, 
can  be  used  for  this  purpose,  while  pointing  out  the  relative  difficulty  of  designing  efficient 
Error  Correction  Codes  (ECC)  to  solve  the  same  problem.  We  consider,  in  our  experi¬ 
ments,  a  specific  high  quality  MR  HDTV  coder(4]  whose  coarse  to  refinement  channel  bit 
rates  are  in  the  ratio  of  1:2.  We  assume  a  spectral  efficiency  of  6  bits/symbol  for  our  spe¬ 
cific  example,  though,  depending  on  the  available  broadcast  bandwidth,  other  scenarios 
may  use  3-4  bits/symbol.  We  evaluate  the  performance  of  the  system  in  terms  of  both 
coverage  area  and  subjective  quality. 

Past  and  current  work 

Most  proposals  to  the  FCC  for  digital  terrestrial  broadcast  in  the  U.S.  initially  approached 
the  problem  as  one  of  point-to-point  transmission.  The  idea  of  graceful  degradation,  pre¬ 
viously  proposed  as  a  natural  advantage  of  multiresolution  systems  [5],  has  been  recently 
included  in  the  ATi:T/Zenith  proposal[6],  a  change  from  their  single  resolution  scheme 
advocated  earlier[7].  The  Sarnofr/NBC/Philips/Thomson(8]  proposal  includes  prioritiza¬ 
tion  in  its  coding  scheme,  but  does  not  possess  the  “embedded”  MR  transmission  to  be 
described  in  this  paper.  The  idea  of  efficient  multiplexing  of  the  different  resolutions  of  a 
MR  transmission  scheme  has  been  studied,  using  multidimensional  constellations,  in  [9], 
although  a  joint  source  and  channel  coding  design  is  not  addressed.  Schreiber  has  pointed 
out  [1,  10]  the  problem  of  spectral  efficiency  for  broadcast,  and  has  proposed  a  hybrid 
analog-over-digital  scheme,  which,  though  multiresolution  in  nature,  does  not  fully  ex¬ 
ploit  recent  advances  in  digital  compression  technology.  Note  that  though  several  works 
[11,  12,  13]  have  considered,  in  different  contexts,  the  problem  of  joint  source  channel 
coding  of  images,  none  has  tackled  the  problem  in  a  broadcast  scenario. 

Outline  of  paper 

The  outline  of  the  paper  is  as  follows.  Section  2  presents  the  digital  broadcast  problem 
and  suggests  a  multiresolution  formulation.  Section  3  reviews  MR  video  coding  [14]  and 
summarizes  the  specific  scheme  [4]  that  is  used  in  this  paper  for  the  HDTV  source  coding. 
Section  4  discusses  the  idea  of  MR  transmission  for  broadcast  channels.  It  reviews  the 
cltissic  idea  of  embedding  [2]  and  shows  how  it  can  be  applied  to  digital  broadcast.  We 
introduce  the  concept  of  embedded  constellaiions  and  show,  through  a  series  of  examples, 
how  these,  possibly  combined  with  Trellis  Coded  Modulation  (TCM)  and  ECC’s,  can 
provide  an  efficient  solution.  Section  6  discusses  the  alternatives  and  proposes  a  recipe 
for  the  broadcast  problem  as  posed  in  Section  2.  Finally,  Section  6  verifies  the  benefits 
of  using  an  embedded  multiresolution  design  and  illustrates  the  robustness  achievable  by 
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using  efEcient  error  concealment  techniques  in  a  MR  coding  environment. 

2  The  digital  broadcast  problem:  a  MR  formulation 

While  Shannon  [15]  established  the  theoretical  optimality  of  the  separation  of  source 
coding,  or  removal  of  redundancy  from  a  source,  from  channel  coding,  or  insertion  of  re¬ 
dundancy  to  combat  a  noisy  channel,  his  results  hold  only  in  the  limit  of  infinitely  complex 
and  long  codes,  and,  more  important,  for  a  single  channel  or  point-to-point  communication 
system.  For  the  broadcast  or  multichannel  environment,  where  a  source  communicates 
with  a  multitude  of  receivers  of  varying  strengths,  as  will  be  explained  in  detail  in  section 
4,  Cover  [2]  established  that  optimal  broadcast  scenarios  are  multiresolution  or  embedded 
in  character.  This  justifies  the  choice  of  a  multiresolution  (MR)  source  coding  scheme 
to  represent  a  source  compactly  in  a  hierarchy  of  resolutions,  to  which  a  “matched”  MR 
transmission  can  be  designed  in  order  to  produce  an  efficient  end-to-end  design. 

2.1  Matched  MR  source  channel  coding 

While  the  problem  of  joint  source  and  channel  coding  has  been  addressed  previously 
in  various  coding  contexts,  as  stated  in  Section  1,  in  this  paper,  we  propose  the  idea 
of  designing  an  end-to-end  joint  MR  system,  i.e.  one  which  includes  a  MR  channel 
coding  scheme  (an  analog  MR  constellation,  possibly  using  a  MR  Trellis  Coded  Modulation 
(TCM),  and/or  a  digital  MR  ECC)  that  is  matched  to  the  MR  source  coding  scheme 
outlined  in  section  3. 

Figure  2  outlines  the  importance  of  employing  a  joint  design.  For  the  different  re¬ 
ceiver  Carrier-to-Noise  Ratios  (CNR’s)  throughout  the  broadcaist  area,  the  MR  digital 
transmission  system  (see  Figure  2(a))  can  reliably  deliver  different  user  bit  rates. 

The  idea  is  to  design  the  MR  source  and  channel  coders  so  that  their  delivered  rates  are 
efficiently  matched.  The  channel  rates  correspond  to  the  MR  modulation  scheme,  while 
the  source  rates  refer  to  the  different  resolutions  of  the  source  coder,  whose  characteristics 
are  shown  in  Figure  2(d)  resulting  in  the  broadcast  characteristics  of  Figure  2(c). 

This  paper  suggests  an  efficient  way  to  do  this  matching.  We  exolain  how  the  MR 
channel  coder  curve,  which  we  attempt  to  match  to  the  MR  source  coder,  can  be  designed 
using  the  concept  of  embedded  transmission,  using  a  modulation  parameter  A.  Note  that 

*Note  that  while  we  use  SNR  as  a  source  quality  measure  in  this  discussion,  we  do  so  with  the  usual 
disclaimer  that  while  perceptual  measures  are  more  meaningful,  they  are  difficult  to  quantify.  Besides, 
any  meaningful  measure  can  be  used  in  place  of  SNR  without  char-  ng  the  nature  of  the  joint  source 
channel  coding  philosophy  we  outline  here.  Also  note  that  SNR  is  a  *urce  quality  measure,  and  CNR  is 
a  channel  quality  measure. 
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(c)  BroidcMl  GhtfMUhtlic  (d)  MR  Souros  Codiai  Ckneuniik 


Figure  2;  Matching  of  MR  source  and  channel  coders  for  desired  broadcast  characteristics, 
(a)  MR  channel  coder  characteristics  (Rate  vs.  CNR),  (b)  Matching  of  threshold  rates 
of  channel  and  source  coders  to  achieve  desired  broadcast  characteristics,  (c)  Achieved 
broadcast  characteristics,  (d)  MR  source  coder  characteristics  (SNR  vs.  Rate). 

while  embedded  transmission  for  broadcast  is  efficient  even  for  a  single  resolution  source, 
it  is  even  more  natural  to  invoke  when  the  source  coder  is  hierarchical  in  nature,  as  is  our 
case,  to  be  described  in  Section  3. 

To  solve  the  matching  problem  of  Figure  2(b),  i.e.  design  the  source  «ind  channel  coders 
with  matched  rates,  one  needs  a  broadcast  performance  criterion  over  which  to  optimize 
the  parameters.  We  now  address  this  problem  and  suggest  a  tractable  formulation. 

2.2  The  problem  of  choosing  a  cost  function 

The  main  difficult}'  in  assessing  the  performance  of  a  digital  broadcast  system  is  that  of 
defining  a  cost  function.  In  other  words,  one  would  like  to  have  some  way  of  measuring 
the  performance  of  a  system  in  terms  of,  say,  the  coverage  area  and  the  delivered  quality, 
for  a  given  set  of  resources  to  be  used,  such  as  bandwidth,  power,  etc.  When  studying 
a  digital  broadcast  problem,  this  measure  is  not  simple;  the  threshold  effect  mentioned 
earlier,  simplistically  stated,  boils  down  to  a  trade-off  between  coverage  area  and  quality 
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of  reception  in  the  case  of  a  single  resolution  scheme.  A  multiresolution  scheme  will  face 
the  same  trade-off  but  in  a  more  complex  way.  For  example,  in  the  two  resolution  case, 
one  can  trade-off  high  quality  (full  resolution)  coverage  area  for  a  lower  quality  (lower 
resolution)  coverage  area,  as  well  as  the  quality  of  each  resolution  for  a  larger  coverage 
area  without  affecting  the  area  of  the  other  resolution.  Would  it  be  better  to  cover  a 
wide  area  with  relatively  low  quality  or  a  small  area  with  high  quality?  The  answer  is  not 
obvious  and  points  to  the  lack  of  a  clear  cost  function  for  this  problem.  However,  making 
some  assumptions  about  both  the  system  and  the  requirements  helps  us  set  the  system 
parameters  without  resorting  to  a  cost  analysis. 

2.3  Setting  the  objectives  for  the  system 

Assume  a  two-resolution  system.  It  is  reasonable  to  expect  the  system  to  provide  the 
two  possible  grades  of  service  (full  resolution  closer  to  the  emitter  and  a  reduced  but  still 
acceptable  quality  further  away)  for  the  respective  areas  defined  by  distances  of  dc  and  dj 
from  the  emitter  {dc  >  dj).  The  crucial  point  is  to  define  what  those  distances  represent 
in  terms  of  quality.  Since  different  systems  will  deliver  different  qualities,  it  is  convenient 
to  define  those  distances  as  the  maximum  distances  at  which  each  channel  is  received 
reliably  (see  Figure  3).  We  can,  for  instance,  equate  reliability  with  the  delivered  error 
rate  being  below  a  desired  threshold.  In  summary,  the  system  requirements  can  be  set 
up  in  terms  of  providing  full-resolution  and  lower-resolution  quality  at  certain  specified 
distances  from  the  emitter.  Now,  the  source  and  channel  coding  have  to  be  chosen  so  as 
to  guarantee  that  the  required  areas  are  covered,  while  maximizing  the  received  quality. 

Before  we  address  a  way  of  dealing  with  the  stated  problem,  we  analyze  the  system 
components,  namely  the  source  and  channel  coders. 

3  Multiresolution  source  coding 

Man}  popular  and  efficient  source  coding  schemes  are  either  directly  or  indirectly  MR 
in  nature.  Methods  like  subband  and  wavelet  coding  have  a  natural  multiresolution 
interpretation,  while  others,  like  DCT  based  teclmi(|ues,  which  represent  a  common  theme 
in  all  the  digital  HDTV  proposals  to  the  FCC.  have  an  “acquired”  MR  interpretation.  For 
a  comprehensive  review  of  multiresolution  digital  coding  techniques,  the  reader  is  referred 
to  [14]. 

Multire.solution  (MR)  source  coding  schemes  can  be  seen  as  successive  approxima¬ 
tion  methods.  While  they  can  be  slightly  suboptimal  in  terms  of  compression  over  a 
single  resolution  (SR)  scheme  that  achieves  the  same  full  resolution  quality  for  point  to 
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Figure  3:  (a)  Definition  of  the  coverage  area,  (b)  Distances  as  a  function  of  the  delivered 
quality 


(a)  (b) 


Figure  4:  Reconstruction  of  the  pyramid  (a)  One  step  of  coarse-to-fine  scale  change 
(b)  The  reconstructed  pyramid.  Note  that  approximately  one  half  of  the  frames  in  the 
structure  (shown  as  shaded)  are  spatially  coded/interpolated. 
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point  communications,  tb'  y  can  be  superior  in  a  broadcast  situation,  which  is  a  multiuser 
communications  problem.  Even  for  point  to  point  communications,  it  has  been  shown 
theoretically  (16,  17]  that  MR  source  coding,  using  a  successive  refinement  approach,  can 
be  theoretically  optimal  under  certain  conditions,  and  recently,  an  efficient  practical  MR 
source  coder  has  been  suggested  [18],  that  compares  very  favorably  with  single  resolution 
approaches  that  achieve  the  same  full-resolution  quality.  The  advaintage  of  MR  based 
schemes  over  SR  schemes  in  a  broadcast  environment  comes  from  the  presence,  in  the 
former,  of  a  coarse  channel  (which  comes  as  a  “by-product”)  that,  combined  with  error 
concealment  techniques  used  at  the  source  decoder,  can  be  used  to  increase  robustness. 

A  more  detailed  analysis  of  this  robustness  issue  will  be  made  in  section  6.  A  note  of 
interest,  especially  in  the  wake  of  the  ongoing  standards  and  compatibility  debates,  is  that 
a  MR  decomposition  affords  a  hierarchy  of  resolutions  that  are  both  natural  and  useful 
for  the  compatibility  and  broadcast  problems. 

A  specific  MR  source  coding  scheme  for  HDTV 

We  now  review  the  MR  source  coder  that  is  an  integral  part  of  the  joint  MR  source  and 
channel  coding  method  we  undertake  in  this  paper.  Our  MR  video  coder  [4]  is  a  three- 
dimensional  pyramidal  decomposition,  based  on  spatiotemporal  interpolation,  forming  a 
hierarchy  of  video  signals  at  increasing  temporal  and  spatial  resolutions  (see  Figure  4  (b)). 

The  structure  is  formed  in  a  bottom-up  manner,  starting  from  the  finest  resolution,  and 
obtaining  a  hierarchy  of  lower  resolution  versions.  Spatially,  images  are  subsampled  after 
anti-aliasing  filtering.  Temporally,  the  reduction  is  achieved  by  simple  frame  skipping, 
since  temporal  filtering  would  be  inadequate  when  motion  is  present  (it  would  produce 
double  images). 

The  encoding  is  done  in  a  stepwise  fashion,  starting  at  the  top  la'  cr  and  working  down 
the  pyramia  in  a  series  of  successive  refinement  steps.  The  coarst-to-fine  scale  change 
step  is  illustrated  in  Figure  4  (a).  At  each  step,  first  the  spatial  resolution  is  increased  by 
linear  interpolation,  then  the  temporal  motion  based  interpolation  is  done  based  on  these 
new  frames  at  the  finer  scale.  We  describe  the  interpolation  procedure  only  briefly,  and 
refer  the  reader  to  [4]  for  details. 

The  unshaded  frames  shown  in  Figure  4  (b)  are  interpolated  in  time.  For  these  frames, 
the  encoder  computes  a  set  of  motion  vectors  that  are  transmitted  along  with  the  residual, 
i.e.  the  difference  between  the  original  and  the  interpolated  frame.  The  motion  vectors 
are  computed  in  a  MR  fashion,  using  a  hierarchical  blockmatching  algorithm  [4].  For  each 
block  in  the  interpolated  frame,  three  different  motion  vector  candidates  for  the  following 
interpolation  modes  are  considered:  backward  interpolation:  the  mo.  ion  vector  that  yields 
the  best  replacement  from  the  previous  frame;  forward  interpolation:  the  motion  vector 
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that  yields  the  best  replacement  from  the  next  frame;  motion  averaged  interpolation:  the 
motion  vector  d  that  yields  the  best  replacement  by  averaging  the  block  displaced  by  d  in 
the  previous  frame  and  displaced  by  -d  in  the  next  frame.  The  mode  that  results  in  the 
best  interpolated  block  (in  the  MSE  sense)  is  selected,  and  the  mode  selection  information 
is  also  encoded  and  transmitted  to  the  receiver. 

A  discrete  cosine  transform  (DCT)  based  coder  is  used  to  encode  the  top  layer  and  the 
subsequent  bandpass  difference  images.  Quantizer  steps,  and  consequently  bit  allocation 
at  different  levels  in  the  hierarchy  are  determined  to  obtain  good  perceptual  quality. 
Another  major  consideration  in  the  bit  allocation  scheme  is  in  “matching”  the  subsequent 
channel  coding,  to  be  described  later  in  the  paper. 

It  is  important  to  note  that,  for  the  MR  source  coder  we  consider  in  our  system,  if 
one  resorts  to  a  two-resolution  hierarchy  comprising  the  two  coarsest  layers  of  the  spatio- 
temporal  pyramid  in  the  coarse  resolution  source  channel,  and  the  difference  layer  in  the 
detail  channel,  then  the  bit  ratio  of  coarse  to  detail  information  is  roughly  1:2  at  high 
perceptual  quality  for  typical  sequences.  This  ratio  is  more  accurate  if  one  considers  that 
the  “vital”  overhead  associated  with  motion- vectors  and  synchronization  would  have  to 
be  carried  in  the  lower  resolution  channel  as  well.  This  1:2  ratio  is  a  key  parameter  in  the 
development  of  our  joint  MR  source  channel  coding  system. 

4  Multiresolution  transmission:  embedding 

The  problem  of  efficient  communication  of  digital  information  from  a  single  source  to  mul¬ 
tiple  receivers  with  various  Carrier-to-Noise  Ratios  (CNR’s)  is  key  to  digital  broadcast  of 
HDTV.  While  the  theory  of  digital  broadcast  has  received  attention  in  early  information- 
theoretical  literature  [2,  16,  19],  there  is  no  evidence  of  the  application  of  the  theoretical 
maxims  proffered  in  [2]  to  the  design  of  practical  digital  broadcast  channels.  An  effi¬ 
cient  end-to-end  broadcast  system  should  have  its  transmission  constellation  matched  to 
its  source  coding  scheme,  and  this  is  the  crux  of  our  work,  which  we  undertake  in  a 
multiresolution  environment. 


4.1  Efficiency  of  using  embedding  for  digital  broadcast 

Figure  5(a)  depicts  a  typical  broadcast  environment,  with  a  source  wishing  to  convey 
information  {r,  si}  to  a  stronger  receiver  and  {^,52}  to  a  weaker  one.  Note  that  r  repre¬ 
sents  the  common  message  to  be  conveyed  to  both  receivers.  In  [2],  Cover  establishes  the 
efficiency  of  superimposing  information,  i.e.  broadcasting  in  a  multiresolution  embedded 
fashion,  where  the  detailed  information  meant  for  the  stronger  receiver  necessarily  in¬ 
cludes  the  coarse  information  meant  for  the  noisier  receiver.  The  efficiency  of  embedded 
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broadcast,  in  terms  of  theoretically  deliverable  bitrate,  comp2U‘ed  to  independent  sharing 
of  the  broadcast  channel  resources  in  time  or  frequency  among  the  receivers  is  depicted  in 
Figure  5(b),  where  the  superior  curve  is  obtained  by  superimposing  the  detail  information 
within  the  coarse  information.  That  is,  the  superior  receiver  1,  in  an  optimal  scenario, 
necessarily  has  access  to  the  information  {r,  S2}  meant  for  the  weaker  receiver  2.  Note 
that  the  plot  portrays  the  potentially  deliverable  bitrates  which  are  upper  bounded  by 
the  Shannon  capacities  of  the  channels,  and  has  the  same  drawback  of  providing  no  more 
than  existential  knowledge,  as  in  Shannon’s  classical  results  on  channel  coding  [15].  In 
this  work,  we  show  a  practical  way  of  realizing  this  embedding  gain. 


<») 


Figure  5:  Typical  broadcast  environment,  (a)  Single  source  broadcasting  to  receivers  of 
channel  capacities  C\  and  C2  (b)  Set  of  achievable  broadceist  bitrates  for  receivers  1,2. 


4.2  Embedding  in  the  modulation  domain 

Cover’s  concept  of  embedding  the  coarse  information  within  the  detailed  information  is 
generic  in  scope,  and  places  no  restrictions  on  the  domain  in  which  the  embedding  should 
be  performed.  To  describe  the  effect  of  an  analog  domain  embedded  modulation,  we  refer 
to  Figure  6  to  point  out  some  typical  MR  embedded  modulation  constellations.  The  basic 
idea  is  that  each  constellation  consists  of  “clouds”  of  mini-constellations  or  “satellites,” 
where  the  detail  information  is  represented  in  the  satellites,  while  the  coarse  information 
is  carried  in  the  clouds.  Thus,  the  loss  of  coarse  information  is  cissociated  with  the 
receiver’s  inability  to  decipher  correctly  which  cloud  was  transmitted,  while  the  ^oss  of 
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Figure  6:  Some  multiresolution  constellations:  (a)  MR  16  QAM,  (b)  MR  16  PSK  ,  (c) 
MR  4  PAM. 

refinement  information  occurs  from  the  receiver’s  confusing  one  intracloud  signal  point  for 
another.  The  decoder  first  decodes  the  likeliest  cloud  (coarse  information),  “subtracts” 
the  decoded  cloud  value  from  the  received  point,  and  then  decodes  the  likeliest  satellite 
within  the  cloud  (detail  information).  Thus,  the  MR  16  QAM  constellation  of  Figure 
6(a)  has  4  bits/symbol,  of  which  2  bits  are  coarse  (4  clouds)  and  2  bits  are  detail  (4 
satellites/cloud).  Similarly,  the  MR  16  PSK  scheme  has  2  coarse  bits/symbol  and  2  detail 
bits/symbol,  while  the  4  PAM  constellation  has  1  bit/symbol  of  each.  For  our  specific 
source  coder,  we  consider  the  MR  64  QAM  constellation  of  Figure  7. 

While  we  present  a  two  resolution  hierarchy,  the  principles  hold  for  any  number  of 
hierarchical  levels  desired,  and  would  result  in  a  “fractal”  modulation  constellation,  al¬ 
though  at  increcised  complexity  and  decreaised  practicality.  We  point  out  later  how  one 
can  combine  an  embedded  ECC  scheme  with  an  embedded  modulation  scheme  to  increase 
the  number  of  broadcast  resolutions  in  a  practical  manner  without  sacrificing  efficiency 
in  the  information-theoretical  sense. 

4.2.1  MR  64-QAM 

Consider  as  Example  A  the  constellation  of  Figure  7.  For  every  6  composite  bits  per 
channel  symbol  emitted  by  the  1:2  source  (see  Section  3),  2  coarse  bits  select  one  of 
the  4  clouds,  while  the  remaining  4  detail  bits  select  one  of  the  16  satellites  within  the 
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MR  64  QAM 


Figure  7:  Example  A:  Ml  G4  QAM  system  constellation  with  definitions  of  A,  C,,  d'^ntrai^)' 
and  the  “fine”  bit  mapping  of  the  constellation  signal  points  according  to  the 
well-known  Karnaugh-map  partitioning.  Note  that  A=0  represents  uniform  4  QAM,  while 
A=1  denotes  uniform  64  Q.\M. 
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Figure  8:  Typical  broadcast  environment,  (a)  MR  QAM  system  block  diagram  (Example 
A).  Note  that  the  modulator  and  demodulator  are  operated  at  transmission  parameter 
A  (see  Figure  3).  (b)  Achievable  performance  (no  packet  loss  probabilities)  for  practical 
system  of  Fig.  6(a).  See  analogy  with  theoretical  curve  of  Fig.  4(b). 

selected  cloud.  By  “matching”  the  relative  distances  between  intra-cloud  constellation 
points  (Di)  and  inter-cloud  points  {D2)-,  whose  ratio  is  a  design  parameter  A,  to  the 
relative  “information  contents”  of  the  two  bitstreams,  one  obtains  an  efficiently  designed 
joint  MR  source/  MR  transmission  system.  One  could  determine  an  optimal  “broadcast 
A”  if  a  meaningful  cost  function  over  the  broadcaist  area  (which  would  probably  include 
factors  like  population  density)  is  available.  On  the  other  hand,  due  to  the  difficulty  of 
this  model,  as  pointed  out  in  Section  2,  one  may  instead  choose,  as  an  operating  point, 
the  maximum  value  of  A  that  meets  the  full-resolution  coverage  range  requirement,  as  will 
be  discussed  in  Section  o. 

The  appendix  contains  the  mathematical  analysis  of  the  coarse  and  detail  channel 
performance  of  the  MR  64  QAM  of  Figure  7,  on  which  the  curves  shown  in  Figure  9 
are  based.  Note  that  those  curves  reflect  packet  error  rates  for  the  two  channels,  where  a 
composite  packet  of  length  1080  bits  (with  1  /3  coarse  and  2/3  detail  information  embedded 
in  it)  is  used  to  prevent  error  propagation. 

While  the  details  are  provided  in  the  appendix,  it  is  important  to  mention  a  few 
salient  features.  Note  that  d’^ntroi^)  ^.nd  d,\,,^^(z)  represent  half  the  Euclidean  distances 
between  signal  point  i  and  its  nearest  coarse  and  detail  neighbors,  respectively,  in  the 
/.--direction.  Also,  it  must  be  emphasized  that  the  topology  of  the  equivalent  constellation 
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at  the  broadcast  receiver  is  a  function  of  the  CNR  and  A.  Qualitatively,  the  CNR  affects 
the  “radius”  of  the  constellation  as  seen  at  the  receiver  for  a  fixed  noise  variance,  while 
A  affects  the  relative  distances  between  intercloud  and  intracloud  points.  As  A  goes  from 
0  to  1,  the  intracloud  and  intercloud  thresholds  decrease  and  increase,  respectively,  for 
a  fixed  power  budget,  indicating  the  quantitative  tradeoffs  involved  in  coarse  and  detail 
channel  robustness  as  shown  in  Figure  9.  Also,  note  that  as  we  can  always  form  a  Gray- 
code  fine  channel  digital  bit-mapping  of  the  constellation  points  exactly  as  in  Karnaugh 
maps  used  in  digital  logic  design  [20]  (see  Figure  7),  we  can  guarantee  that  every  point 
in  the  constellation  is  at  Hamming  distance  one  away  from  each  of  its  intracloud  nearest 
neighbors.  Thus,  assuming  that  single  bit  errors  dominate  when  symbol  errors  occur, 
we  can  equate  symbol  errors  with  bit  errors.  This  leads  to  an  efficient  mapping,  besides 
aiding  in  the  mathematical  analysis. 

Due  to  favored  protection  of  the  coarse  stream  via  the  parameter  A,  it  is  possible 
for  the  fine  packet  component  to  be  corrupted,  while  the  coarse  packet  component  is 
received  reliably  for  the  same  composite  packet.  The  dotted  curves  in  Figure  9  refer  to  a 
“naive"  multiplexing  of  the  broadcast  channel  between  the  coarse  and  detail  information 
strean;-,  under  conditions  of  equal  power,  bandwidth,  and  average  spectral  efficiency,  as 
will  be  explained  in  Section  5.  The  curves  clearly  show  the  superiority  of  embedding 
over  multiplexing.  For  example,  for  values  of  A  from  0.2  to  about  0.4,  both  coarse  and 
detail  channel  performances  are  better  than  those  of  the  multiplexed  case.  The  particular 
multiplexing  point  shown  in  the  figure  is  obtained  when  the  power  in  the  coarse  and  detail 
constellations  are  made  equal,  though  similar  performance  improvement  can  be  obtained 
by  embedding  over  any  other  multiplexing  point  also,  corresponding  to  different  values  of 
A.  This  is  a  verification  of  the  information  theoretical  result  that  embedding  outperforms 
multiplexing.  See  Figure  8  (b). 

4.3  Embedded  TCM  constellation 

In  order  to  increase  reliability  of  reception  over  the  demanding  broadcast  channel  and 
to  increase  coverage  area,  it  may  be  necessary  to  add  more  redundancy  to  protect  the 
broadcast  information.  As  is  well  known,  convolutional  codes  deploying  a  Euclidean 
distance  metric  can  achieve  better  performance  for  the  same  complexity  than  the  more 
commonly  used  block  ECC's,  which  use  a  “hard  limiting”  Hamming  distance  metric. 
Convolutional  (trellis)  codes  achieve  coding  gain  by  using  soft  decoding  with  the  Viterbi 
algorithm[21].  Conventional  convolutional  coding,  like  block  coding,  would  require  an 
increase  in  bitrate  to  accommodate  the  redundant  bits,  which  must  come  at  the  expense 
of  lowered  source  coding  quality,  for  a  fixed  total  throughput.  How'ever,  it  is  possible  to 
achieve  almost  all  the  coding  gain  theoretically  possible,  i.e.  to  approach  the  Shannon 
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Figure  9:  Example  A;  Probability  of  packet  error  vs.  Receiver  CNR  over  the  entire  range 
of  transmission  parameter  A  for  the  embedded  MR  64  QAM  case  and  a  composite  packet 
length  of  1080.  (a)  Fine  channel  packet  loss,  (b)  Coarse  channel  packet  loss. 
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limit,  by  expanding  the  2-D  modulation  constellation  by  a  factor  of  2,  and  deploying 
a  redundcint  constellation  via  Trellis  Coded  Modulation,  as  established  by  Ungerboeck 
[22].  While  multi-dimensional  TCM  [23,  24]  can  provide  the  same  gain  for  a  sm^lller 
expansion  factor  than  the  2-D  Ungerboeck  constellations,  we  restrict  ourselves  to  the 
latter,  in  the  interest  of  simplicity  of  design  and  analysis.  The  novelty  here  is  that  we 
combine  the  concept  of  multiresoluiion  with  the  power  of  TCM  to  propose  an  embedded 
TCM  modulation  for  efficient  broadcast  of  a  MR  source  (see  Figure  11  (a)). 

An  Ungerboeck  TCM  scheme  requires  an  expansion  factor  of  2  in  the  constellation 
size.  Thus,  our  original  MR  64  QAM  constellation  would  be  expanded  to  128  QAM,  using 
the  same  power  as  the  former.  Of  course,  this  large  constellation  size  is  for  our  specific 
example  (Example  B:  see  Figure  10):  a  more  practical  example  for  HDTV  broadcast 
might  be  expansion  of  a  MR  16  QAM  scheme  (with  a  1:1  coarse  to  detail  bitrate  ratio,  as 
in  [25],  using  2  bits/symbol  for  each  resolution)  into  an  embedded  TCM  32  QAM,  which 
is  certainly  practical  in  size.  The  principle  of  operation  is  what  is  important.  The  idea 
for  the  TCM  128  QAM  scheme  (see  Figure  11  (a))  is  that  the  coarse  information  retains 
preferential  protection  through  A,  while  the  detail  information  gets  expanded  from  16 
points  to  32  points  per  cloud  via  a  TCM  coding  scheme.  Figure  11  (a)  shows  the  first  level 
set-partitioning  for  each  32  point  cloud  into  the  subset  marked  “a”  and  its  complement 
(unmarked),  each  subset  enjoying  a  3  dB  gain  in  squared  Euclidean  minimum  distance 
over  that  of  its  parent,  as  needed  for  an  Ungerboeck  code. 

Figure  11  (b)  shows  the  coding  gain  for  the  fine  channel  (the  coarse  channel  remains 
unchanged)  for  A  =  0.3  for  trellises  with  4,  8,  and  16  states.  The  coding  gain  over  the 
unexpanded  MR-64  QAM  constellation  is  seen  to  be  consistent  with  that  tabulated  in 
[22].  Thus,  the  simple  4  state  trellis  is  seen  to  provide  a  coding  gain  of  3  dB/symbol  in 
CNR.  Identical  gains  in  detail  channel  protection  will  occur  for  any  desired  value  of  A. 
Thus,  for  an  efficient  end-to-end  MR  design,  one  may  ensure  the  coarse  channel  robustness 
through  A,  while  using  a  TCM  code  of  acceptable  complexity  to  achieve  the  desired  full- 
resolution  coverage  area.  An  important  feature  of  our  MR  system  is  that  due  to  inclusion 
of  error  concealment  techniques  at  the  decoder  (see  Section  6.1),  it  is  possible  to  obtain 
indistinguishable  full-resolution  quality  even  al  a  fine-channel  packet  loss  rate  exceeding 
10“’.  As  seen  from  Figure  11(b),  at  this  high  los.s  rale,  one  gets  marginal  return  from 
using  trellises  over  4  stales,  thus  making  our  TC.M  design  nearly  optimal  with  only  a 
simple  4-state  trellis!  It  is  important  to  note  that  this  scheme  permits  operation  with  no 
decrease  in  source  coding  bit  rale  over  that  of  an  uncoded  system.' 
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Figure  10:  Example  B:  Block  diagram  of  an  embedded  MR  TCM  system  using  a  128 
QAM  constellation.  Note  that  it  consists  of  4  clouds  of  trellis-code-modulated  32  QAM 
constellations. 


Embedded  TCM  128-QAM 
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Figure  11:  Example  B:  (a)  Expansion  of  MR  64  QAM  into  MR-TCM  128  QAM  with 
an  expansion  in  constellation  points  of  each  cloud  from  16  to  32.  Note  that  the  coarse 
channel  is  unaffected,  (b)  Coding  gain  over  MR  64  QAM  for  the  detail  channel  using 
MR-TCM  128  QAM  for  A  =  0.3. 
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4.4  Embedding  in  the  ECC  domain:  UEP  codes 

While  the  effect  of  providing  unequal  degrees  of  robustness  for  the  coarse  and  refinement 
channels  via  the  analog  parameter  A  was  discussed  above,  the  algebraic  coding  expert 
may  argue  correctly  that  one  can  achieve  similar  results  by  using  digital  techniques  like 
error  correcting  codes  with  unequal  error  protection  (UEP)  [26,  27,  28,  29,  13].  While  the 
TCM  constellation  mentioned  earlier  is  indeed  efficient,  there  may  be  practical  limitations 
to  expanding  the  constellation  size.  Moreover,  one  may  need  ECC’s  to  “bridge”  any 
mismatches  in  rate  between  the  source  coder  and  the  channel  modulator  (see  Figure  2). 

It  can  be  seen  that  embedding  in  the  modulation  and  ECC  domains  are  essentially 
equivalent.  In  the  ECC  domain,  codewords  of  length  n  in  {GF{2))"  are  clustered  into 
“clouds”  whose  members  (“satellites”)  are  closer  in  Hamming  distance  with  respect  to 
one  another,  than  to  members  of  other  clouds.  Codes  having  this  behavior  would  then  be 
called  Unequal  Error  Protection  (UEP)  codes. 

A  UEP  code  can  be  described  as  an  (n, fci,  1:2, ti, <2)  code  (where  t,  represents  the 
number  of  channel  errors  the  code  can  withstand  for  the  information  k,).  It  has  to  be  noted 
that  using  a  UEP  code  is  by  no  means  the  only  way  to  provide  unequal  error  protection. 

As  a  first  approach,  one  could  use  two  different  codes  for  each  category  of  information, 
but  it  is  essential  to  note  that  embedding  the  codes  can  yield  better  (in  terms  of  the  rate: 
k/n)  codes  than  using  two  separate  codes.  In  other  words,  combining  two  (nj,  ki,ti)  and 
(n2,  ^21^2)  codes  to  obtain  a  (nj  +  02,  ^1,  ^21^1,  tz)  code  can  potentially  be  outperformed  by 
a  (n,  ^1,  ^2i  til  <2)  embedded  code.  As  an  example,  consider  a  (63,12,24,5,3)  binary  cyclic 
UEP  code,  listed  in  (30).  Alter r;.'-tively,  one  can  consider  two  smaller  BCH  codes,  with 
characteristics  (31,11,5)  and  (31,12,3).  The  BCH  codes  can  provide  the  same  protection 
but  clearly  their  rates  are  worse  than  those  obtained  with  the  embedded  code.  To  further 
the  analogy  with  the  modulation  domain,  the  use  of  d-‘"erent  codes  for  the  different  classes 
of  information  (as  in  [11])  can  be  likened  to  the  “naii  t  multiplexing  for  transmission  for 
the  two  user  broadcast  channel. 

The  advanta^  s,  in  terms  of  rate,  of  using  embedded  UEP  codes  are  clear,  but  they 
come  at  a  high  price.  Indeed,  UEP  codes  require,  in  their  design,  a  comparatively  much 
higher  effort  than  the  usual  single  resolution  codes.  Substantial  work  has  been  done  in 
designing  the  UEP  codes  and,  in  particular,  on  providing  bounds  for  the  attainable  rates 
for  codes,  specially  linear  UEP  codes  (LUEP),  having  these  properties.  However,  no 
structured  method,  that  does  not  require  brute  force  com'  :er  search,  has  been  described 
to  design  these  codes.  See  Lin  et.  al.  [30]  for  a  tabulation  of  all  possible  embedded  ECC’s 
of  odd  lengths  up  to  65.  The  codes  listed  in  [30]  are  not  appropriate  for  the  application 
considered  in  that,  of  those  codes  with  ratio  of  coarse  to  detail  information  (^'2//:])  clos' 
to  2,  few  are  efficient  (i.e.  with  rates,  (/cj  +  A-2)/n,  close  enough  to  1).  Figure  12  (Example 
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C)  presents  the  block  diagram  of  a  scheme  that  uses  embedded  UEP  codes,  and  Figure  13 
shows  the  results,  in  terms  of  packet  loss,  for  different  CNR’s,  when  several  of  the  codes 
tabulated  in  [30]  are  used. 


Figure  12:  Example  C:  Block  diagram  of  a  MR  system  with  embedded  ECC’s  for  the 
coarse  and  detail  channels. 

Thus,  while  UEP  codes  can  be  used  instead  of  MR  modulation  schemes  to  perform 
the  MR  transmission,  the  issue  of  designing  good  UEP  codes  is  largely  open  and  involves 
a  high  degree  of  complexity.  Following  the  above  considerations,  for  our  application,  we 
consider  unequally  error  protected  ECC’s  designed  independenily  for  the  coarse  and  detail 
information  channels.  Using  the  same  coarse  packetsize  of  360  user  bits  [k)  and  various 
levels  of  redundancy  (n  —  k),  we  simulated  the  performance  of  various  {n,k,t)  ECC’s. 
(Example  D).  This  example  consists  of  protection  of  only  the  coarse  channel  to  varying 
degrees  of  robustness,  while  leaving  the  detail  channel  uncoded.  See  Figure  14  (a).  Figure 
14  (b)  shows  how  using  ECC’s  lowers  the  probability  of  coarse  packet  loss  over  the  range 
of  CNR’s  of  interest. 

4.5  Hybrid  embedded  modulation/ECC  scheme 

It  must  be  noted  that  an  efficient  end-to-end  system  might  need  to  deploy  both  ECC  and 
MR  embedded  modulation  in  tandem  to  jointly  “match”  the  source  coder.  For  example, 
one  could  use  a  non-uniform  QAM  scheme  rather  than  the  uniform  QAM  constellation  in 
Figure  14  (a)  (Example  D).  Thus,  the  ECC  scheme  could  be  used  as  a  “bridge”  to  achieve 
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Figure  13:  Example  C:  Probability  of  packet  error  vs.  Receiver  CNR  for  some  known 
embedded  ECC  codes.  Note  that  the  5- tuple  (n,  fcj,  ^2,  ti,  <2)  listed  refers  to  the  embedded 
code  length,  the  coarse  bits  per  block,  the  detail  bits  per  block,  the  error-correction 
capability  for  the  coarse  bits  per  block,  and  the  error-correction  capability  for  the  detail 
bits  per  block,  resp.  The  packetsizes  of  the  coarse  and  detail  channels  are  360  and  720 
bits  resp.  (a)  Fine  channel  packet  loss,  (b)  Coarse  channel  packet  loss. 
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(a) 


Figure  14:  Example  D;  Multiplexed  (non-embedded)  ECC  using  a  family  of  BCH  codes. 
Note  that  error  correction  is  applied  to  the  coarse  channel  only,  with  the  detail  channel 
sent  unprotected.  Note  that  the  3-tuple  (n,  i)  listed  refers  to  the  code  length,  the  coarse 
bits  per  block,  and  the  error-correction  capability  for  the  coarse  bits  per  block.  The  packet 
loss  rate  refers  to  a  coarse  packet  length  of  360  bits,  (a)  Block  diagram,  (b)  Simulation 
of  coarse  channel  performance. 


a  match  between  the  bitrates  (coarse  and  fine)  required  by  the  MR  constellation  and  the 
bitrates  (source  bits  plus  ECC  bits)  sent  through  each  of  the  channels  (see  Figure  2). 
Also,  embedding  in  both  the  ECC  domain  and  the  MR  modulation  domain  would  lead  to 
an  efficiently  designed  MR  joint  source-channel  system  with  more  than  two  resolutions, 
without  resorting  to  a  complex  “fractar  modulation  constellation.  This  could  be  accom¬ 
plished,  for  example,  for  a  three  resolution  design,  by  having  the  two  coarsest  resolutions 
being  embedded  in  the  ECC  domain,  and  the  resultant  composite  coarse  bitstream  being 
embedded  in  the  third  (detail)  channel  bitstream  in  the  modulation  domain  as  a  2  layer 
embedding  (see  Figure  15). 


5  An  efficient  end-to-end  system  design 

In  the  previous  section,  we  have  illustrated,  by  way  of  examples,  the  different  tools  one 
can  employ  to  design  an  efficient  broadcast  system.  Here,  we  undertake  a  comparison  of 
the  tradeoffs  involved  in  the  various  schemes,  and  then  provide  a  general  recipe  to  help 
solve  the  problem,  as  stated  in  Section  2. 
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Figure  15:  Example  E:  Block  diagram  of  a  MR  system  with  3  levels  of  resolution  using 
both  embedded  ECC’s  and  embedded  modulation  to  make  overall  design  efficient  and 
practical. 

5.1  Comparison  of  A-Modulation,  TCM,  and  ECC  schemes 

As  pointed  out  in  the  previous  section  through  Examples  A-E,  a  number  of  UEP  schemes 
can  be  invoked  to  ensure  efficient  MR  transmission.  Table  1  gives  the  coordinates  of 
Exzunples  A-D  of  our  paper. 

The  A-modulation  scheme  of  Example  A  might  be  used  to  provide  a  desired  coverage 
range  for  the  coarse-resolution  signal,  and  a  “basic”  coverage  for  the  fine  channel,  with 
the  MR  TCM  scheme  of  Example  B  used  to  increase  the  full -resolution  coverage  area 
using  an  embedded  TCM  for  the  fine  channel.  While  the  ECC  scheme  of  Example  D 
can  be  used  instead  to  make  the  coarse  resolution  channel  more  robust,  it  comes  at  the 
cost  of  reduced  quality,  for  a  fixed  total  bit  rate  budget  for  source  and  channel  coding. 
The  TCM  scheme  of  Example  B,  on  the  other  hand,  does  not  sacrifice  source  coding 
quality  compared  to  that  of  the  uncoded  system,  but  it  requires  an  expanded  modulation 
constellation.  However,  seen  from  Figure  11(b),  for  a  probability  of  fine  channel  packet 
loss  of  10“^  (reasonable  to  get  good  full  resolution  quality  if  the  coarse  channel  is  near¬ 
perfect  and  error  concealment  is  invoked:  see  Section  6.1  and  Figure  21),  one  needs  only  a 
simple  4-state  Ungerboeck  trellis  to  get  most  of  the  coding  gain.  Thus,  Example  B  seems 
like  an  attractive  solution  if  it  can  meet  the  coverage  and  quality  demands. 

The  hybrid  scheme  of  Example  E  may  be  necessary  to  address  a  particular  broadcast 
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Example 

A 

B 

C 

D 

Description 

MR  QAM 

MRTCM 

Embedded  ECC 

Multiplexed  ECC 

Section 

B 

B 

4.4 

4.4 

Block  Diagram 

Fig.  8 

Fig.  10 

Fig.  12 

Fig.  14 

Simulation 

Fig.  9 

Fig.  11 

Fig.  13 

Fig.  14 

Table  1:  Summary  of  presented  alternatives. 

problem,  especially  if  more  than  two  grades  of  service  are  required.  The  scheme  of  Exam¬ 
ple  C  (embedded  ECC’s),  while  efficient  in  an  information  theoretical  sense,  is  unlikely 
to  meet  the  bit  rate  ratios  of  the  different  resolutions  required  of  most  practical  HDTV 
schemes,  and  is  hence  omitted  from  our  discussion. 

Table  2  gives  a  comparison  of  Examples  A,  B,  and  D  for  a  typical  problem.  We  fix  the 
coarse  channel  quality  and  coverage  requirement  for  receiver  CNR’s  above  20  dB/symbol 
at  a  delivered  packet  error  rate  (PER)  of  less  than  10"^,  and  compare  the  full  resolution 
quality  and  coverage  range  for  the  different  schemes.  As  seen,  all  schemes  perform  well 
with  respect  to  an  uncoded  system.  Note  the  benefit  of  error  concealment  used  by  the 
MR  source  decoder  to  increase  robustness.  Example  B,  operating  at  A  =  0.3  with  an 
embedded  4-state  trellis  is  the  best  choice  if  constellation  expansion  is  tolerable,  while 
Example  A  is  a  good  low-complexity  solution.  Example  D  gives  the  same  coverage  as 
Example  B,  but  it  requires  a  complicated  ECC  which  also  results  in  15%  reduced  fine 
channel  bit  rate,  and  therefore  a  degraded  full-resolution  quality. 

We  now  compare  the  A-modulation  scheme  of  Example  A  with  the  ECC  scheme  of 
Example  D.  Figure  14  (b)  of  Example  D  shows  how  using  ECCs  lowers  the  probability  of 
coarse  packet  loss  over  the  range  of  CNR’s  of  interest.  A  comparison  w'ith  Figure  9  shows 
how  we  can  achieve  similar  performances  via  either  the  channel  modulation  parameter  A 
or  an  appropriate  ECC. 

Figure  16  show’s  the  different  performances  obtained  with  a  modulation  domain  UEP 
achieved  for  A=0.5  and  an  ECC  domain  UEP  obtained  by  protecting  the  coarse  channel 
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Example 

Uncoded 

A 

B 

D 

PER 

<  10““ 

<  10“^ 

<  10-* 

<  10““ 

Gsarse 

CNR  (dB/symbol) 
Range 

>  27 

>  20 

>  20 

>  20 

Low 

Resolution 

Quality 

- 

Same 

as 

uncoded 

Same 

as 

uncoded 

Same 

as 

un coded 

PER 

<  lo--*  (•) 

7 

o 

VI 

<  10“’ 

<  10“’ 

Fine 

CNR  (dB/symbol) 
Range 

>  27 

>  27 

>  24 

>  24 

High 

Resolution 

Quality 

- 

Same 

as 

uncoded 

Same 

as 

uncoded 

Fine  channel 

bitrate  15%  less 
than  uncoded  (•*) 

Design  parameter 

- 

A  =  0.3 

BCH(255,179,10) 

Complexity 

- 

Same 

order  as 

uncoded 

Higher 

than 

uncoded 

Higher 

than 

uncoded 

Increase  in  coverage 
over  uncoded 

Coarse:  -)-7dB 

Full:  -fOdB 

Coarse:  -f7dB 

Full:  -f3dB 

Coarse:  ■f7dB 

Full:  +3dB 

Table  2:  Comparison  of  schemes  A,  B  and  D.  We  require  Packet  Error  Rate  (PER)  less 
than  10“'^  for  the  coarse  channel  at  20  dB/symbol  CNR.  We  compare  performance  for 
fine  channel  PER  less  than  10“’. 

(■*)  For  the  uncoded  system,  the  fine  channel  error  rate  cannot  be  10“’  as  error  conceal¬ 
ment  requires  “perfect’'  coarse  channel  performance  at  that  fine  channel  error  rate.  See 
Section  6.1. 

(**)  The  reduction  in  bitrate  available  for  source  coding  is  due  to  the  use  of  an  ECC. 
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SNR(dB) 


CNR  (dB/ijmbol) 

Thretholding  anumcd  at  Pricoanr  packet  crror)sI0‘* 
and  Pr(fine  packet  errDr)=ir’ 


Figure  16:  Tradeoff  between  using  modulation  domain  protection  via  transmission  pa¬ 
rameter  A  =  0.5,  and  ECC  domain  protection  using  a  BCH(  127,99,4)  code  applied  to  the 
coarse  channel. 

with  a  BCH(127,99,4)  code.  The  two  UEP  schemes  provide  identical  coarse  channel 
performance,  with  CNR’s  below  20  dB/symbol  receiving  no  signal,  and  the  crossover 
from  coarse  resolution  quality  (SNR=  /3)  to  full  resolution  occurring  at  24  dB/symbol 
and  26  dB/symbol  respectively  for  the  ECC  (Example  D)  and  A-modulation  (Example 
A)  schemes.  Note  however,  that  the  parity  bits  needed  by  the  ECC-protected  coarse 
channel  must  necessarily  come  at  the  expense  of  a  lower  fine  channel  bitrate,  resulting  in 
degraded  full  resolution  quality,  as  noted  in  Table  2. 

Thus,  if  a  comparison  is  to  be  made  on  the  basis  of  equal  bandwidth,  the  ECC  scheme 
would  necessarily  have  lower  full  resolution  quality  (SNR=Qi)  than  the  MR  modulation 
scheme  (SNR=Q2)  for  all  receiver  CNR’s  better  than  25.5  dB/symbol,  but  the  full  reso¬ 
lution  gain  in  CNR  is  1.5  dB  for  the  ECC  scheme  (24  dB  vs.  25.5  dB).  The  assessment  of 
the  tradeoff  depends  on  the  values  of  Oj,  Oj,  and  /?,  which  in  turn  depend  on  the  source 
coding  used. 

The  following  points  of  comparison  between  the  two  schemes  of  Example  A  (MR 
modulation)  and  D  (ECC  scheme)  are  worthy  of  note: 

•  The  coverage  tradeoff  is  between  the  modulation  schemes  degradation  of  quality 
by  (oj  —  /?)  dB  for  receivers  between  24  dB/symbol  and  25.5  dB/symbol  CNR, 
versus  the  ECC  scheme’s  degradation  of  full-resolution  quality  by  Oi  —  <>2  dB  for  all 


26 


AFIT/ AFOSR  Wavelets  Workshop  345 

receivers  with  CNR  better  than  25.5  dB/symbol. 

•  Note  the  complexity  disparity  in  the  two  schemes,  with  the  ECC  scheme  resorting  to 
a  complicated  BCH  code,  while  the  MR  embedded  modulation  QAM  scheme  comes 
at  relatively  little  excess  cost  over  that  of  a  uniform  QAM  scheme  which  must  be 
used  for  transmission  anyway. 

•  The  MR  modulation  parameter  A,  being  a  continuous  variable,  also  affords  any 
desired  operating  point  over  the  range  of  CNR’s  of  interest,  while  the  ECC  scheme, 
being  discrete  in  nature,  may  not  afford  a  solution  at  any  desired  operating  point. 

•  In  an  information-theoretic  sense,  an  embedded  MR  coding  scheme  outperforms  a 
non-embedded  one,  and  embedding  is  accomplished  much  more  easily  in  the  mod¬ 
ulation  domain.  As  the  ECC  scheme  uses  a  Hamming  distance  metric  compared  to 
a  softer  Euclidean  distance  criterion  for  the  modulation  scheme,  the  latter  is  more 
efficient. 

5.2  An  efficient  choice  of  system  parameters 

Assume  that  the  chosen  modulation  constellation  constrains  the  coarse  and  fine  channels 
to  operate  at  bit  rates  of  Rc  and  Rj  (for  our  MR  64  QAM  example,  we  must  have 
Rc  :  i2/=s2:4).  Note  that  Rc  and  Rj  represent  the  combined  bit  budget  to  be  allocated 
between  source  coding  and  channel  coding  (i.e.  error  correction)  for  the  coarse  and  fine 
channels  respectively.  Refer  to  Section  2  and  Figure  3.  As  the  coarse  channel  is  the 
“^lnchor”  channel  for  the  MR  system,  it  should  meet  the  desired  coverage  requirement 
de  of  Figure  3  at  the  mcLximum  low-resolution  quality  that  the  bitrate  constraint  will 
permit.  To  this  end,  a  sensible  strategy  would  be  to  allocate  the  coarse  channel  bit 
budget  completely  for  source  coding,  while  using  the  embedded  A-modulation  scheme  of 
Example  A  to  provide  the  needed  robustness  for  the  coarse  channel.  Thus,  it  would  be 
efficient  to  operate  at  the  maximum  value  of  A  for  which  the  coarse-distanc.e  coverage 
range  requirement  dc  is  satisfied  with  the  desired  reliability.  This  approach  is  reasonable 
because  the  coarse  channel  represents  the  fallback  mode  and  not  only  provides  a  minimum 
quality  in  those  areas  where  the  fine  channel  cannot  be  reliably  decoded,  but  also  allows  for 
better  error  concealment  in  the  transition  area  between  full  resolution  and  lower  resolution 
coverage  (See  Section  6.1.).  Moreover,  the  higher  the  low  resolution  quality,  the  higher 
the  full  resolution  quality  will  be,  for  a  given  modulation-fixed  fine  channel  bit  budget. 

If  the  above  approach  results  in  an  impractically  low  value  of  A,  one  could  resort 
to  a  hybrid  scheme  using  ECC’s  and  a  practical  A-constellation.  This  could  be  used  to 
“boost”  the  coarse  channel  coverage,  albeit  at  the  expense  of  a  lower  coarse-resolution 
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quaJity,  since  part  of  the  tota]  budget  must  now  be  diverted  from  source  coding  to  channel 
coding. 

Now,  the  Rj  bits  of  the  fine  channel  have  to  be  allocated  between  error  protection 
{Rj,c)  and  source  coding  (/?/,*),  where  R/  =  Rj^,-{-Rj^c-  As  was  discussed  in  Section  4,  an 
effic’ent  strategy  is  to  protect  the  fine  channel  with  the  MR  TCM  scheme  of  Example  B. 
If  the  constellation  expansion  is  reasonable,  this  is  efficient,  as  it  costs  no  error  protection 
bits  (although,  the  subtlety  lies  in  the  fact  that  we  have  an  expanded  constellation); 
i.e.  Rf,c  =  0.  However,  if  the  constellation  is  not  practical  in  size,  or  if  an  increase  in 
full-resolution  coverage  is  desired,  we  may  need  an  ECC  to  help  satisfy  coverage  range. 

The  essential  point  is  that  once  the  coverage  distances  (i.e.  dc  and  dj)  have  been 
fixed,  the  joint  source  channel  coding  problem  has  been  converted  into  a  simpler  one, 
thus  enabling  us  to  determine  the  remaining  free  variables  of  the  system,  given  dc,  dj,  Rc 
and  Rf. 

(Step  1)  Use  the  budget  allocated  to  the  coarse  channel,  Rc,  to  maximize  the  quality  of 
the  source  coder  for  that  channel.  (One  could  for  example  adopt  the  guidelines  of 
the  optimal  source  bit  allocation  strategy  described  in  [31]  to  achieve  optimality  for 
the  pyramidal  coder  considered  in  this  paper.) 

(Step  2)  Use  a  MR  modulation  scheme  (Example  A)  and  set  A  to  the  maximum  value 
for  w'hich,  at  distance  dc  from  the  emitter,  the  error  rate  for  the  coarse  channel  is 
below  the  desired  threshold. 

(Step  3)  Use  a  MR  TCM  scheme  (Example  B)  to  protect  the  fine  channel  if  the  con¬ 
stellation  size  is  reasonable.  An  embedded  multi-dimensional  TCM  scheme  may  be 
deployed  [23]  to  reduce  the  expansion  factor,  if  complexity  permits.  Else,  and  if  an 
additional  increase  in  coverage  (coding  gain)  is  desired  beyond  that  affordable  by 
TCM,  find  an  efficient  error  correction  code  (convolutional  or  block)  that  satisfies 
the  desired  fine  channel  error  probability  at  distance  dj.  This  code  will  use  Rj_c  bits 
and  therefore  the  remaining  Rj_,  =  Rj  —  Rj_c  will  be  used  for  source  coding.  How¬ 
ever,  due  to  the  effect  of  error  concealment  techniques  feasible  with  an  MR  design, 
it  is  unlikely  that  the  fine  channel  would  require  additional  protection  (see  Section 
6.1).  Note  that  if  the  MR  TCM  scheme  will  suffice  to  meet  the  requirements,  no 
extra  channel  bits  would  be  needed  and  Rj_,  =  Rj. 

(Step  4)  Finally,  use  the  remaining  Rj,,  bits  for  the  fine  channel  source  allocation,  in  an 
efficient  manner,  as  in  Step  1. 
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6  Comparison  of  MR  embedded,  MR  independent, 
and  SR  constellations 

Simulations  were  carried  out  for  an  Additive  White  Gaussian  Noise  (AWGN)  channel 
for  the  multiresolution  embedded  constellation,  the  multiresolution  non-embedded  con¬ 
stellation  (i.e.  independently  transmitted  constellations  for  the  two  resolutions),  and  the 
single  resolution  constellation,  as  shown  in  Figure  17.  The  independent  case  refers  to 
separate  transmission  of  the  coarse  and  fine  channels  using  “naive  multiplexing”  of  the 
frequency  spectrum.  To  ensure  fairness  of  comparison,  all  three  cases  were  tailored  to 
operate  under  conditions  of  equal  average  power  (it  can  be  shown  that  the  comparison 
under  equal  peak  power  constraint  would  be  similar)  and  equal  spectral  efficiency  (i.e. 
throughput/bandwidth). 

To  compare  the  MR  vs.  independent  constellations,  a  MR  64-QAM  (of  free  parameter 
A),  and  a  16/256  QAM  (coarse/fine)  independent  constellation  pair  were  picked.  The 
independent  channels  have  a  spectral  efficiency  of  4  bits/symbol  and  8  bits/symbol,  or  an 
average  spectral  efficiency  (6  bits/symbol)  identical  to  that  of  the  MR  64-QAM.  Recall  the 
curves  of  Figure  9.  Also  shown  on  these  curves  is  the  performance  of  the  non-embedded 
MR  scheme  for  the  independent  constellations  of  16  QAM  (coarse)  and  256  QAM  (fine). 

As  was  mentioned  in  Section  4.2,  for  the  range  of  values  of  A  from  about  0.2  to  0.4,  the 
embedded  MR  scheme  outperforms  the  multiplexed  MR  scheme  for  both  coarse  and  fine 
channels.  In  order  to  get  a  comprehensive  picture  of  the  situation,  a  plot  of  received 
quality  (SNR)  vs.  receiver  CNR  is  shown  in  Figure  18(a),  using  perceptually  consistent 
thresholding  of  the  curves  of  Figure  9  at  coarse  and  fine  packet  loss  probabilities  of  10“^ 
and  10”^  respectively,  as  justified  earlier.  As  can  be  seen  from  Figure  18  (a)  (and  Figure 
8(b)),  the  MR  constellation  outperforms  the  independent  one  over  all  ranges  of  CNR's 
for  some  A  values  (e.g.  A=0.2). 

In  our  comparison,  we  tissume  that  the  SR  source  coder  is  16%  more  efficient  than  the 
MR  coder.  This  is  a  “worst  caise”  analysis  from  the  MR  point  of  view,  as  an  empirical 
comparison  using  the  popular  “Lenna”  image  even  in  a  non-MR-friendly  framework  like 
the  still-image  coding  standard  JPEG  [32]  revealed  only  a  16%  increase  in  source  com¬ 
pression  for  the  SR  JPEG  scheme.  Under  these  conditions,  the  SR  channel  could  afford  a 
32-QAM  modulation  scheme  for  the  same  transmitter  power  as  the  MR  64-QAM  scheme 
due  to  a  source  compression  advantage  of  roughly  5/6.  For  fairness  of  comparison,  the 
SR  scheme  received  the  same  thresholding  (10“^)  as  the  coarse  resolution  packet  stream, 
as  they  both  achieve  transitions  from  the  region  of  no  signal  (oblivion)  to  the  region  of 
discernible  signal. 

Note  that  for  the  MR  64-Q.\M  scheme,  the  coarse  J  fine  packetized  channels  could 
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Figure  17:  (a)  MR  64-QAM  constellation  of  parameter  A.  (b)  Independent  modulation 
constellations  (16/256  QAM)  for  coarse  and  fine  channels,  (c)  Single  resolution  32-QAM 
constellation.  All  constellations  use  equal  power. 
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be  interpreted  as  entering  virtual  independent  buffers  .with  throughputs  in  the  ratio  of 
1:2,  with  instantaneous  temporal  mismatches  in  the  input  channel  rates  being  absorbed 
by  the  buffers  and  if  necessary,  to  prevent  overflow  or  underflow,  resolved  by  exchange  of 
data  between  the  buffers,  resulting  in  minimal  degradation  for  slight  mismatches. 

The  results  shown  in  Figure  18  indicate  the  tradeoffs  involved.  As  can  be  seen  by 
comparing  the  SR  scheme  with,  say,  the  MR  embedded  scheme  with  A  =  0.5,  the  broad¬ 
cast  coverage  area  is  much  greater  for  the  MR  scheme,  at  the  price  of  some  mid-region 
suboptimality. 

A  point  to  note  in  favor  of  the  MR  scheme  is  the  increase  in  full-resolution  quality 
coverage  area  made  possible  by  performing  error  concealment  techniques  to  be  described 
next.  The  SR  scheme  loses  this  advantage  as  it  has  no  coarse  resolution  channel  to  fall 
back  upon. 

6.1  Error  concealment 

Due  to  the  nature  of  the  broadcast  communication,  it  is  impossible  (or  perhaps  imprac¬ 
tical)  to  achieve  error-free  transmission.  Therefore,  any  real  system  has  to  be  able  to 
function  in  the  presence  of  transmission  errors,  which  may  range  from  occasional  bit  er¬ 
rors  in  a  satellite  system  to  packet  losses  in  digital  networks.  Communication  systems  also 
vary  in  their  resilience  to  error  and  speed  of  recovery.  Bitstreams  are  often  packetized  to 
speed  up  resynchronization  in  case  of  a  channel  error,  but  a  single  bit  error  still  renders 
the  whole  packet  unusable.  Recursive  systems  (motion-compensated  hybrid  DCT  being 
the  typical  example)  take  much  longer  to  recover,  specifically  until  the  next  restart  of  the 
prediction  loop.  An  error  concealment  scheme  is  often  required  to  mask  those  errors  and 
provide  a  gracefully  degrading  picture. 

The  source  coder  we  have  used  is  based  on  a  finite  memory  structure,  and  errors  would 
not  accumulate  but  die  out  within  a  few  time  samples.  The  structure  used  in  conjunction 
with  the  MR  modulation  also  allows  very  successful  error  concealment.  As  can  be  seen 
in  Figure  9,  for  typical  values  of  A,  at  the  same  CNR’s  for  which  the  fine  channel  packet 
error  rate  is  greater  than  10“’,  the  coarse  channel  is  almost  perfect  (packet  error  rate  less 
than  10“®).  Concealment  strategies  are  typically  based  on  a  much  lower  packet  loss  rate 
(on  the  order  of  10“®,  as  in  [33]),  since  transmission  systems  beised  on  prediction  loops 
are  extremely  fragile. 

Therefore,  most  of  the  errors  will  occur  in  the  fine  detail  and  a  coarse  version  and  mo¬ 
tion  vectors  will  be  available  for  concealment.  The  concealment  strategy  differs  slightly 
for  the  frames  that  are  i;iterpolated  spatially  or  temporally,  and  eissumes  that  the  infor¬ 
mation  transmitted  in  the  coarse  channel  i'^  -itact.  The  spatially  interpolated  frames  of 
the  finest  layer  are  called  anchor  frames,  a.-  ihey  have  no  temporal  dependence  on  any 
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Figure  18:  Typical  broadcast  environment  (a)  SNR  vs.  Receiver  CNR.  (b)  Broadcast 
ranges  for  the  different  constellations. 
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Figure  19.  Resolutions  of  the  pyramid  (a)  Coarsest  layer,  (b)  Intermediate  layer,  (c) 
Full-resolution  layer.  Images  are  of  size  128x128,  256x256  and  512x512,  respectively. 
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Figure  21:  Effect  of  error  concealment  for  15%  fine-channel  packet  loss  (blow  up  of  Figure 
20):  (a)  Corrupted  spatial  residual  frame,  (b)  R.econstructioD  without  error  concealment, 
(c)  Reconstruction  with  error  concealment. 
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other  framne. 

For  the  temporally  interpolated  frames,  motion  vectors  and  the  selected  interpolation 
mode  for  each  block  are  available,  but  the  actual  interpolation  error  (or  the  residual)  is 
lost.  In  the  packetized  transmission  we  have  implemented,  the  typical  region  affected  by 
a  packet  loss  is  a  narrow  strip  8  pixels  in  height,  and  1000-2000  pixels  long.  Because  the 
encoder  uses  a  smooth  motion  vector  field  (enforced  by  the  hierarchical  motion  estimation 
algorithm),  and  both  previous  and  next  frames  are  available  for  interpolation,  errors  tend 
to  be  very  small.  Most  artifacts  show  up  as  “blockiness”  and  are  almost  invisible,  even 
in  a  still  frame.  Since  these  frames  are  not  used  to  predict  any  other  frames,  the  errors 
do  not  need  to  be  processed  further. 

The  errors  are  more  visible,  and  potentially  last  longer  for  the  spatially  interpolated 
(anchor)  frames.  The  artifacts  appear  as  blurred  blocks  or  decreased  spatial  resolution, 
and  are  clearly  visible  in  still-frame.  (See  Figure  21(b).)  Furthermore,  since  the  previous 
and  next  frames  (which  are  temporally  interpolated)  are  based  on  this  frame,  errors  can 
be  annoying  in  real-time.  The  concealment  is  based  on  replacing  the  region  affected  by 
the  lost  packet  from  the  previous  anchor  frame.  Since  motion  vectors  are  not  available 
for  this  frame,  an  approximation  is  computed  based  on  the  motion  vectors  of  the  previous 
frame.  Then,  this  is  used  to  interpolate  blocks  from  the  previous  anchor  frame.  This 
works  very  well  in  practice,  as  motion  vectors  resemble  the  true  motion. 

This  concealment  strategy  gives  excellent  results  even  in  extreme  cases  of  packet  loss. 
Complete  loss  of  a  frame  can  be  tolerated,  and  sustained  15%  packet  loss  rate  causes 
no  visible  loss  in  quality.  Figure  20  shows  the  effect  of  15%  fine-packet  loss  (obtained 
for  A=0.5,  CNR=25.5  dB/symbol)  on  the  spatial  residual  of  the  sequence,  with  Figure 
20(c)  showing  the  reconstructed  quality,  while  Figure  21  illustrates  the  power  of  error 
concealment  in  a  MR  environment. 

7  Conclusion 

We  have  demonstrated  a  multiresolution  (MR)  joint  source  channel  coding  system,  where 
using  a  source  coder  matched  to  an  embedded  trellis-coded  modulation  constellation 
(with/without  error  correction  coding)  has  been  shown  to  provide  an  efficient  end-to- 
end  MR  system.  The  threshold  effect  plaguing  single  resolution  (SR)  systems  is  softened 
by  a  stepwise  graceful  degradation  reminiscent  of  analog  systems,  without  sacrificing  the 
source  coding  advantage  of  digital  schemes.  W'e  show  the  superiority  of  an  embedded  MR 
transmission  scheme  over  independent  transmissions  of  the  MR  source  resolutions,  and 
point  out  the  tradeoffs  in  robustness  and  broadcast  area  coverage  of  low  and  high  resolu¬ 
tions  between  embedded  MR  and  SR  digital  systems  for  QAM  constellations,  highlighting 
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the  benefits  of  deploying  joint  MR  source  and  channel  coding. 
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Appendix 

Analysis  of  MR  QAM  of  section  4.2.1 
Let  us  introduce  some  definitions  associated  with  Figure  7. 

•  5=  {set  of  all  constellation  points  in  the  modulation  scheme). 

•  N  =  |51  is  the  number  of  signals  in  the  constellation. 
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•  D=  {set  of  all  “directions”  (N,S,E,W)  representing  the  one-sided  independent  de¬ 
grees  of  freedom  for  the  additive  Gaussian  noise,  with  unit  directional  vectors 
(un,us,ue,Uw)  respectively.} 

•  Ci=  {j\j  €  S  and  i,j  are  in  the  same  cloud} 

i.e.  the  set  of  all  points  which  are  in  the  same  cloud  as  signal  t. 

•  ^*ntro(*)(^Hi<er(*))=  EucIidcan  distance  between  i  and  its  nearest  “fine” 

(“coarse”)  neighbor  in  the  (positive)  direction.  Thus,  {d*,„a(0}  € 

D,  Vi  €  S  is  the  minimum  instantaneous  noise  amplitude  component  in  the  di¬ 
rection  that  will  cause  the  receiver  to  incorrectly  decode  the  intracloud  (intercloud) 
information  in  that  direction.  Note  also  that  if  a  signal  point  should  have  no  neigh¬ 
bor  in  the  positive  Ui^  direction,  then  its  corresponding  nearest-neighbor  distance 
will  be  oo. 

From  these  definitions,  using  the  simple  Gaussian  error  function,  we  can  obtain  closed 
form  solutions  to  the  probabilities  of  fine  and  coarse  channel  bit  errors  (P/^  and  P/j,, 
respectively).  It  is  easy  to  show  that  the  probability  of  fine  bit  error  for  a  given  A  and 
CNR  are  given  by: 

P',(X,CA’R)=  2:  (1) 

ieS(X,CNR)  keD 

where  erfc(x)  above  is  the  standard  complementary  Gaussian  error  function  defined 
as 


ericix)  =  {2/y/Tr)J  dt  (2) 

and  q{i)  in  Equation  1  refers  to  the  symbol  probabilities,  which,  if  assumed  to  be 
equal,  would  simplify  it  to: 

p.',(KCAR)  =  ^  Z  (3) 

i€S(A,CN«)k€D 

Similarly,  for  the  coarse  bitstream,  we  have,  assuming  equiprobable  symbols: 

p:.,(x,ca'R)=.^  Z  H) 

i€S(.\.C.V/?)  <r€D 

In  order  to  prevent  error  propagation,  we  packclize  the  streams  into  a  composite  length 
of  L  bits/packel,  comprising  L/Z  bits  of  coarse  data  and  2L/3  bits  of  fine  information  (as 
demanded  by  the  1:2  ratio  in  coarse  to  fine  bit  rate).  In  the  absence  of  ECC,  we  assume 
that  a  single  bit  error  anywhere  in  an  entire  packet  corrupts  that  packet  and  causes  it  to 
get  lost.  As  was  shown,  due  to  the  Karnaugh  mapping,  single  bit  errors  will  dominate. 
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Defining  the  packet  error  probabilities  as  and  respectively  for  the  coarse  and  fine 
channels,  we  have: 


CNR)  =  1  -  (1  -  />4(A,  CNR))‘-N  (5) 

and, 

CNR)  =  1  -  (1  -  P.‘,(A,  CNR))‘-/‘  (6) 

See  Figure  9  for  a  plot  of  the  curves  for  L=1080  using  the  MR  64  QAM  constellation  for 
both  coarse  and  fine  packet  probability  of  loss  performance  as  a  function  of  the  broadcast 
area  CNR  for  a  multitude  of  A  values  encompassing  its  region  of  definition  from  0  to  1. 
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Abstract 

In  this  paper,  we  provide  an  overview  of  the  several  components  of  a  research  effort  aimed  at 
the  development  of  a  theory  of  multiresolution  stochastic  modeling  and  associated  techniques  for 
optimal  multiscale  statistical  signal  and  image  processing.  As  we  describe,  a  natural  framework  for 
developing  such  a  theory  is  the  study  of  stochastic  processes  indexed  by  nodes  on  lattices  or  trees 
in  which  different  depths  in  the  tree  or  lattice  correspond  to  different  spatial  scales  in  representing 
a  signal  or  image.  In  particular  we  will  see  how  the  wavelet  transform  directly  suggests  such  a 
modeling  paradigm.  This  perspective  the  leads  directly  to  the  investigation  of  several  classes  of 
dynamic  models  and  related  notions  of  “multiscale  stalionarity”  in  which  scale  plays  the  role  of  a 
time-iike  variable.  In  this  paper  we  focus  primarily  on  the  investigation  of  models  on  homogeneous 
trees.  In  particular  we  describe  the  elements  of  a  dynamic  system  theory  on  trees  and  introduce  two 
notions  of  stationarity.  One  of  these  leads  naturally  to  the  development  of  a  theory  of  multiscale 
autoregressive  modeling  including  a  generalization  of  the  celebrated  Schur  and  Levinson  algorithms 
for  order-recursive  model  building.  The  second,  weaker  notion  of  stationarity  leads  directly  to  a 
class  of  state  space  models  on  homogeneous  trees.  We  describe  several  of  the  elements  of  the  system 
theory  for  such  models  and  also  describe  the  natura*,  extremely  efficient  algorithmic  structures  for 
optimal  estimation  that  these  models  suggest:  one  class  of  algorithms  has  a  multigrid  relaxation 
structure;  a  second  uses  the  scale-to-scale  whitening  property  of  wavelet  transforms  for  our  models; 
and  a  third  leads  to  a  new  class  of  Riccati  equations  involving  the  usual  predict  and  update  steps 
and  a  new  “fusion”  step  as  information  is  propagated  from  fine  to  coarse  scales.  As  we  will  see,  this 
framework  allows  us  to  consider  in  a  very  natural  way  the  fusion  of  data  from  sensors  with  differing 
resolutions.  Also,  thanks  to  the  fact  that  wavelet  transforms  do  an  excellent  job  of  “compressing” 
large  classes  of  covariance  kernels,  we  will  see  that  these  modeling  paradigms  appear  to  have  promise 
in  a  far  broader  context  than  one  might  expect. 
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1  Introduction 

In  recent  years  there  has  been  considerable  interest  and  activity  in  the  signal  and 
image  processing  community  in  developing  multi-resolution  processing  algorithms. 
Among  the  reasons  for  this  are  the  apparent  or  claimed  computational  advantages  of 
such  methods  and  the  fact  that  representing  signals  or  images  at  multiple  scales  is 
an  evocative  notion-  it  seems  like  a  “natural”  thing  to  do.  One  of  the  more  recent 
areas  of  investigation  in  multiscale  analysis  has  been  the  emerging  theory  of  multiscale 
representations  of  signals  and  wavelet  transforms  [10,  21,  22,  23,  24,  28,  33,  34,  38,  49]. 
This  theory  has  sparked  an  impressive  flurry  of  activity  in  a  wide  variety  of  technical 
areas,  at  least  in  part  because  it  offers  a  common  unifying  language  and  perspective 
and  perhaps  the  promise  of  a  framework  in  which  a  rational  methodology  can  be 
developed  for  multiscale  signal  processing,  complete  with  a  theoretical  structure  that 
pinpoints  when  multiresolution  methods  might  be  useful  and  why. 

It  is  important  to  realize,  however,  that  the  wavelet  transform  by  itself  is  not  the 
only  element  needed  to  develop  a  methodology  for  signal  analysis.  To  understand  this 
one  need  only  look  to  another  orthonormal  transform,  namely  the  Fourier  transform 
which  decomposes  signals  into  its  frequency  components  rather  than  its  components 
at  different  resolutions.  The  reason  that  such  a  transfor’n  is  useful  is  that  its  use 
simplifies  the  description  of  physically  .ueaidngful  classes  of  signals  and  important 
classes  of  transformations  of  those  signals.  In  particular  stationary  stochastic  pro¬ 
cesses  are  whitened  y  the  Fourier  transform  so  that  individual  frequency  components 
of  such  a  process  are  statistically  uncorrelated.  Not  only  does  this  greatly  simplify 
their  analysis,  but,  it  also  allows  us  to  deduce  that  frequency-domain  operations 
such  as  Wiener  or  matched  filtering-or  their  time  domain  realizations  as  linear  shift- 
invariant  systems-aren’t  just  convenient  things  to  do.  They  are  in  fact  the  right-  i.e., 
the  statistically  optimal-  things  to  do.  In  analogy,  w\at  is  needed  to  complement 
wavelet  transforms  for  the  construction  of  a  rational  framework  for  multi-resolution 
signal  analysis  is  the  identification  of  a  rich  class  of  signals  and  phenomena  whose 
description  is  simplified  by  wavelet  transforms.  Having  this,  we  then  have  the  basis 
for  developing  a  methodology  for  scale  domain  filtering  and  signal  processing,  for 
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deducing  that  such  operations  are  indeed  the  right  ones  to  use,  and  for  developing 
a  new  and  potentially  powerful  set  of  insights  and  perspectives  on  signal  and  image 
analysis  that  are  complementary  to  those  that  are  the  heritage  of  Fourier. 

In  this  paper  we  describe  the  several  components  of  our  research  into  the  de¬ 
velopment  of  a  theory  for  multiresolution  stochastic  processes  and  models  aimed  at 
achieving  the  objectives  of  describing  a  rich  class  of  phenomena  and  of  providing  the 
foundation  for  a  theory  of  optimal  multiresolution  statistical  signal  processing.  In  de¬ 
veloping  this  theoretical  framework  we  have  tried  to  keep  in  mind  the  three  distinct 
ways  in  which  multi-resolution  features  can  enter  into  a  signal  or  image  analysis  prob¬ 
lem.  First,  the  phenomenon  under  investigation  may  possess  features  and  physically 
significant  effects  at  multiple  scales.  For  example,  fractal  models  have  often  been  sug¬ 
gested  for  the  description  of  natural  scenes,  topography,  ocean  wave  height,  textures, 
etc.  [5,  35,  36,  41].  Also,  anomalous  broadband  transient  events  or  spatially-localized 
features  can  naturally  be  thought  of  as  the  superposition  of  liner  resolution  features 
on  a  more  coarsely  varying  background.  As  we  will  see,  the  modeling  framework  we 
describe  is  rich  enough  to  capture  such  phenomena.  For  example,  we  will  see  that 
1/f  -like  stochastic  processes  as  in  [50,  51]  are  captured  in  our  framework  as  are  sur¬ 
prisingly  useful  models  of  many  other  processes.  Secondly,  whether  the  underlying 
phenomenon  hao  multi-resolution  features  or  not,  it  may  be  the  case  that  the  data 
that  has  been  collected  is  at  several  different  resolutions.  For  example  the  resolu¬ 
tions  of  remote  sensing  devices  operating  in  different  bands-  such  as  IR,  microwave, 
and  various  band  radars-  may  differ.  Furthermore,  even  if  only  one  sensor  type  is 
involved,  measurement  geometry  may  lead  to  resolution  differences  (for  example,  if 
zoomed  and  un-zoomed  data  are  to  be  fused  or  if  data  is  collected  at  different  sensor- 
to-scene  distances).  As  we  will  see,  the  framework  we  describe  provides  a  natural  way 
in  which  to  design  algorithms  for  such  multisensor  fusion  problems. 

Finally,  whether  the  phenomenon  or  data  have  multi-resolution  features  or  not, 
the  signal  analysis  algorithm  may  have  such  features  motivated  by  the  twT  principal 
manifestations  of  the  at  least  superficially  daunting  complexity  of  many  image  pro¬ 
cessing  problems.  The  first  and  more  well-known  of  these  is  the  use  of  multi-resolution 
algorithms  to  combat  the  computational  demands  of  such  problems  by  solving  coarse 
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(and  therefore  computationally  simpler)  versions  and  using  these  to  guide  (and  hope¬ 
fully  speed  up)  their  higher  resolution  counterparts.  Multigrid  relaxation  algorithms 
for  solving  partial  differential  equations  are  of  this  type  as  are  a  variety  of  computer 
vision  algorithms.  As  we  will  see,  the  stochastic  models  we  describe  lead  to  several 
extremely  efficient  computational  structures  for  signal  processing. 

The  second  and  equally  important  issue  of  complexity  stems  from  the  fact  that 
a  multi-resolution  formalism  allows  one  to  exercise  very  direct  control  over  “greed” 
in  signal  and  image  reconstruction.  In  particular,  many  imaging  problems  are,  in 
principle,  ill-posed  in  that  they  require  reconstructing  more  degrees  of  freedom  then 
one  has  elements  of  data.  In  such  cases  one  must  “regularize”  the  problem  in  some 
manner,  thereby  guaranteeing  accuracy  of  the  reconstruction  at  the  cost  of  some  res¬ 
olution.  Since  the  usual  intuition  is  precisely  that  one  should  have  higher  confidence 
in  the  reconstruction  of  lower  resolution  features,  we  are  led  directly  to  the  idea  of 
reconstruction  at  multiple  scales,  allowing  the  resolution-accuracy  tradeoff  to  be  con¬ 
fronted  directly.  As  we  will  see  the  algorithms  arising  in  our  framework  allow  such 
multi-scale  reconstruction  and  provide  the  analytical  tools  both  for  assessing  resolu¬ 
tion  versus  accuracy  and  for  correctly  accounting  for  fine  scale  fluctuations  as  a  source 
of  “noise”  in  coarser  scale  reconstructions. 

While  there  are  several  ways  in  which  to  introduce  and  motivate  our  modeling 
framework,  one  that  provides  a  fair  amount  of  insight  begins  with  the  wavelet  tran- 
forms.  However,  the  key  for  modeling  is  not  to  view  the  transform  as  a  method 
for  analyzing  signals  but  rather  as  a  mechanism  for  synthesizing  or  generating  such 
signals  beginning  with  coarse  representations  and  adding  fine  detail  one  scale  at  a 
time.  Specifically  let  us  briefly  recall  the  structure  of  multiscale  representations  as¬ 
sociated  with  orthonormal  wavelet  transforms  (22,  33].  For  simpbeity  we  do  this  in 
the  contviXt  of  1  —  I?  signals  (i.e.  signals  with  one  independent  variable),  but  the 
extension  to  multidimensional  signals  and  images  introduces  only  notational  rather 
than  mathematical  complexity. 

The  multiscale  representation  of  a  continuous  signal  f{x)  consists  of  a  sequence 
of  approximations  of  that  signal  at  finer  and  finer  scales  where  the  approximations 
of  f{x)  at  the  mth  scale  consists  of  a  weighted  sum  of  shifted  and  compressed  (or 
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dilated)  versions  of  a  basic  scaling  function  <^(1): 


fm{x)=  ^  /(m,n)<^(2"‘i  -  n)  (1.1) 

n=  — 00 

In  order  for  the  (m  +  l)st  approximation  to  be  a  refinement  of  the  mth,  we  require 
^(x)  to  be  representable  at  the  next  scale: 

<i>{x)  =  Y^h{n)(i>{2x  -  n)  (1.2) 

n 

As  shown  in  [22],  h{n)  must  satisfy  several  conditions  for  (1.1)  to  be  an  orthonor¬ 
mal  series  and  for  several  other  properties  of  the  representation  to  hold.  In  particular 
h{n)  must  be  the  impulse  response  of  a  quadrature  mirror  filter  (QMF)  [22,  44].  The 
simplest  example  of  such  &  4>,h  pair  is  the  Haar  approximation  with 


<f,{x)  = 


1  0  <  X  <  1 

0  otherwise 


h{n)  = 


1  n  =  0,l 
0  otherwise 


By  considering  the  incremental  detail  added  in  obtaining  the  (m  +  l)Et  scale  ap¬ 
proximation  from  the  mth,  we  arrive  at  the  wavelet  transform.  Such  a  transform  is 
based  on  a  single  function  i’{x)  that  has  the  property  that  the  full  set  of  its  scaled 
translates  |2"‘/*V’(2"*x  —  n)j  form  a  complete  orthonormal  basis  for  T*.  In  [22]  it  is 
shown  that  and  V-  are  related  via  an  equation  of  the  form 

V’(®)  =  -  n)  (1.5) 

n 

where  g{n)  and  h{n)  form  a  conjugate  mirror  filter  pair  [44],  and  that 

/m+i(x)  =  /m(x)  -t-  ^  d(m,n)V’(2”’i  -  n)  (1.6) 

n 

Thus,  /m(x)  is  simply  the  partial  orthonormal  expansion  of  /(i),  up  to  scale  m,  with 
respect  to  the  basis  defined  by  For  example  if  and  h  are  as  in  (1.3),  (1.4),  then 

1  0  <  X  <  1/2 

V-(x)  =  I  -1  1/2  <  X  <  1  (1.7) 

0  otherwise 
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1  n  =  0 
-1  n  =  1 
0  otherwise 


(1.8) 


and  ~  Haar  basis. 

One  of  the  appealing  features  of  the  wavelet  transforms  for  the  analysis  of  signals  is 
that  they  can  be  computed  recursively  in  scale,  from  fine  to  coarse.  Specifically,  if  we 
have  the  coefficients  {/(m  +  1,  •)}  of  the  (m  +  l)st-scale  representation  we  can  “peel 
off”  the  wavelet  coefficients  at  this  scale  and  at  the  same  time  carry  the  recursion  one 
complete  step  by  calculating  the  coefficients  {/(m,  •)}  at  the  next  somewhat  coarser 
scale: 


/(m,  rj)  =  ^  h{2n  —  k)f{m  -I-  1,  fc) 
k 

(1.9) 

d{m,  n)  =  ^  gi2n  —  k)f{m  +  1,  fc) 

(1.10) 

k 

Reversing  this  process  we  obtain  the  synthesis  form  of  the  wavelet  transform  in 
which  we  build  up  finer  and  finer  representations  via  a  coarse-to-fine  scale  recursion: 

/(m  +  1,  n)  =  ^  —  n)/(m,  k)  g{2k  —  n)d{m,  k)  (l-H) 

K  k 

Thus  we  see  that  the  synthesis  form  of  the  wavelet  transform  defines  a  dynamical 
relationship  between  the  coefficients  f{m,n)  at  one  scale  and  those  at  the  next.  Indeed 
this  relationship  defines  a  lattice  on  the  points  {m,n),  where  (m  +  l,fc)  is  connected 
to  (m,n)  if  /(m,n)  influences  /(m  +  l,k).  The  simplest  example  of  such  a  lattice  is 
the  dyadic  tree  illustrated  in  Figure  1,  where  each  node  t  corresponds  to  a  particular 
scale/shift  pair  (m,n).  As  with  all  these  lattices,  the  scale  index  is  indeed  time-like, 
with  each  level  of  the  tree  corresponding  to  a  representation  of  signals  or  phenomena 
at  a  particular  scale.  In  this  paper  we  focus  for  the  most  part  on  this  tree  structure 
and  on  dynamic  models  and  stochastic  processes  defined  on  it^.  Note  that  this  setting 
has  a  natural  association  with  the  Haar  transform  in  which  the  value  at  a  particular 
*  In  Sections  4  and  5  we  briefly  describe  some  aspects  of  the  more  general  case. 
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node  t  =  (m,  n)  is  obtained  from  the  average  of  the  values  at  the  two  descendant  nodes 
(m  -f-  1,2ti)  and  (m  +  1,2ti  +  1).  However,  while  the  Haar  transform  indeed  plays  an 
important  role  in  our  analysis,  the  dyadic  tree  and  the  pyramidal  structure  it  captures 
should  be  viewed  in  a  broader  sense  as  providing  a  natural  setting  for  capturing 
representations  of  signals  at  multiple  resolutions  where  the  relationships  between  the 
representations  at  different  resolutions  need  not  be  constrained  to  the  rigid  equalities 
in  (1.9)  -  (1.11).  Rather,  if  we  view  these  multiscale  representations  more  abstractly, 
much  as  in  the  notion  of  state,  as  capturing  the  features  of  signals  up  to  a  particular 
scale  that  are  relevant  for  the  “prediction”  of  flner-scale  approximations,  we  can  define 
rich  classes  of  stochastic  processes  and  models  that  contain  the  multiscale  wavelet 
representations  of  (1.9)  -  (1.11)  as  special  (and  in  a  sense  degenerate)  cases. 

Carrying  this  a  bit  farther,  let  us  return  to  the  point  made  earlier  that  for  wavelet 
transforms  to  be  useful  it  should  be  the  case  that  their  application  simplifies  the 
description  or  properties  of  signals.  For  example,  this  clearly  would  be  the  case  for 
a  stochastic  process  that  is  whitened  by  (1.9),  (1.10),  i.e.  for  which  the  wavelet 
coefficients  {d(Tn,  •)}  at  a  particular  scale  are  white  and  uncorrelated  with  the  lower 
resolution  version  {/(ni,-)}  of  the  signal.  In  this  case  (1.11)  represents  a  first-order 
recursion  in  scale  that  is  driven  by  white  noise.  However,  as  we  know  from  time  series 
analysis,  white-noise-driven  first-order  systems  yield  a  comparatively  small  class  of 
processes  which  can  be  broadened  considerably  if  we  allow  higher-order  dynamics. 
Also,  in  sensor  fusion  problems  one  wishes  to  consider  collectively  an  entire  set  of 
signals  or  images  from  a  suite  of  sensors.  In  this  case  one  is  immediately  confronted 
with  the  need  to  use  higher-order  models  in  which  the  actually  observed  signals  may 
represent  samples  from  such  a  model  at  several  scales,  corresponding  to  the  differing 
resolutions  of  individual  sensors. 

In  this  paper  we  describe  two  stochastic  modeling  paradigms  for  multiresolution 
processes  that  have  as  their  motivation  the  preceding  observations  as  well  as  the  desire 
to  investigate  and  develop  multiscale  counterparts  to  the  notions  of  stationarity  and 
rationality  that  have  proven  to  be  of  such  value  in  time  series  analysis.  The  first 
step  in  doing  this  is  the  introduction  of  dynamics  and  concepts  of  shift-invariance  on 
dyadic  trees,  and  in  the  next  section  we  outline  the  elements  of  this  formalism  and 
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in  particular  introduce  two  notions  of  (second-order)  shift-invariance  for  stochastic 
processes  on  dyadic  trees.  In  Section  3  we  the.'  use  the  stronger  of  these  two  notions  to 
develop  a  theory  of  multiscale  autoregressive  modeling  and  in  particular  we  describe 
a  generalization  of  the  celebrated  Schur  and  Levinson  algorithms  for  the  efficient 
construction  of  such  models.  Figure  2  illustrates  the  output  of  a  third-order  model 
of  this  type  displaying  some  of  the  fractal-like,  multi-scale  characteristics  that  can  be 
captured  by  this  class  of  models.  An  alternate  modeling  paradigm — coinciding  with 
that  of  Section  3  only  for  first-order  models — is  described  in  Section  4.  This  formalism, 
which  generalizes  finite-dimensional  state  models  to  dyadic  trees,  also  can  be  used 
to  capture  fractal-like  behavior  and  indeed  includes  the  1/f-like  models  developed 
in  [50,  51]  as  a  special  case.  Moreover  these  models  provide  surprisingly  accurate 
descriptions  of  a  broad  variety  of  stochastic  processes  and  also  lead  to  extremely 
efficient  and  highly  paraUelizable  algorithms  for  optimal  estimation  and  for  the  fusion 
of  multiresolution  measurements  using  multiscale,  scale-recursive  generalizations  of 
Kalman  filtering  and  smoothing.  For  example.  Figure  3(a)  illustrates  the  sample 
path  of  a  process  with  a  1/f-like  spectrum  and  its  optimal  estimation  based  on  noisy 
measurements  of  the  process  collected  only  at  the  two  ends  of  the  data  interval. 
Figure  3(b)  illustrates  the  use  of  our  methodology  for  the  estimation  of  the  process 
based  on  these  noisy  data  augmented  with  coarser  resolution  measurements-  i.e. 
the  formalism  we  describe  allows  us,  with  relative  ease,  to  use  coarse  scale  data  to 
optimally  guide  the  interpolation  of  fine-scale  but  sparsely-collected  data.  Figures  3(c) 
and  3(d)  display  analogous  results  for  the  case  of  a  standard  Gauss-Markov  process 
in  which  an  approximate  multiscale  model  for  this  process  is  used  to  design  the 
coarse/fine  data  fusion  and  interpolation  algorithm. 

Due  to  the  limitations  of  space  our  presentation  of  the  various  topics  we  have 
mentioned  is  of  a  summary  nature.  References  to  complete  treatments  are  given, 
and,  in  addition,  in  Section  5  we  briefly  discuss  several  important  issues,  current  lines 
of  investigation,  and  open  questions. 
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2  Stochastic  Processes  and  Dynamic  Models  on 
Dyadic  Trees 

In  this  section  we  introduce  the  machinery  needed  for  specifying  linear  models  of 
random  processes  on  the  dyadic  tree,  that  is  for  stochastic  processes  yt  where  t  is 
an  element  of  the  set  of  nodes,  T,  of  the  tree  of  Figure  1.  As  indicated  in  the 
introduction,  we  have  several  objectives  in  developing  such  models.  Our  first  objective 
is  to  introduce  models  that  can  be  specified  by  finitely  many  parameters  in  order  to 
provide  associated  effective  algorithms.  That  is,  we  would  like  to  develop  models 
analogous  to  those  specified  by  finite-order  difference  equations  or  finite-dimensional 
state  models-  i.e.  those  corresponding  to  rational  system  functions-  which  have 
provided  the  setting  for  a  vast  array  of  powerful  methods  of  signal  and  system  analysis. 
Also,  recursive  models  of  this  type  are  naturally  associated  with  a  notion  of  causality. 
In  our  context  we  will  also  seek  recursive  structures  where  the  associated  notion  of 
causality  will  be  in  scale,  from  coarse  to  fine  as  in  the  wavelet  transform  synthesis 
equation  (1.11). 

Finally,  another  notion  from  time  series  that  we  will  want  to  adapt  to  our  context 
is  that  of  shift-invariance  or  stationarity.  To  understand  what  is  involved  in  this,  let 
us  recall  the  usual  notion  of  stationarity*  for  a  discrete-time,  zero-mean  stochastic 
processes  yt,  where  in  this  case  teZ,  the  integers.  Such  a  process,  with  covariance 
function 

n..  =  E[yty.]  (2.1) 

is  stationary  if  for  all  integers  n.  That  is,  shifting  the  time  index  of 

the  process  by  n  leaves  the  statistics  invariant.  Since  it  is  also  obviously  time  that 
r,j  =  re  ,,  we  can  immediately  deduce  that 

r,,t  =  rrf(,.e)  (2.2) 


where  d{s,t)  —  |t  —  s|. 

*In  this  paper  we  focus  completely  on  linear  models  and  second-order  properties,  which,  of  course, 
yield  complete  descriptions  if  the  processes  considered  arc  Gaussian. 
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In  order  to  understand  how  we  might  generalize  these  ideas  to  the  dyadic  tree,  we 
need  to  make  several  observations.  The  first  is  that  the  integers  Z  and  our  dyadic  tree 
are  both  examples  of  homogeneous  trees.  Specifically  a  homogeneous  tree  of  multi¬ 
plicity  q  is  an  infinite  acyclic  graph  such  that  each  node  has  exactly  q4-l  branches  to 
other  nodes  representing  its  neighbors.  In  the  case  ol  Z,  q  =  1,  and  the  neighbors  of 
an  integer  t  are  simply  <  —  1  and  t  -f  1.  For  the  case  o(  T,  q  =  2.  However,  Figure  1 
isn’t  the  easiest  way  in  which  to  see  this  or  to  understand  notions  of  stationarity. 
Specifically,  in  considering  the  usual  notion  of  stationarity  we  are  compelled  to  con¬ 
sider  processes  defined  on  of  Z,  and  the  same  is  true  in  our  context.  Thus,  we 
must  be  able  to  extend  our  tree  in  all  directions  capturing  in  particular  the  fact  that 
there  is  neither  a  finest  nor  a  coarsest  scale  of  description.  A  much  more  convenient 
representation  of  T  that  allows  such  extensions  is  depicted  in  Figure  4.  As  we  will 
see,  both  Figures  1  and  4  will  prove  of  use  to  us. 

An  important  fact  about  trees  is  that  there  is  a  natural  notion  of  distance  d{s,t) 
between  two  nodes,  s  and  t,  namely  the  number  of  branches  on  the  path  from  s  to 
t,  which  reduces  to  \t  —  s|  for  Z.  This  allows  us  to  define  the  notion  of  an  isometry, 
that  is  a  one-to-one  and  onto  map  of  the  tree  onto  itself  that  preserves  distances. 
For  Z  the  only  isometries  are  shifts,  t  < — ►  t  +  n  i.e.  and  reversals,  i.e.  t  > — ►  ~t 
(and  concatenations  of  these),  so  that  a  useful  way  (for  us!)  in  which  to  define  the 
usual  notion  of  stationarity  is  that  the  statistics  of  the  process  are  invariant  under 
any  isometry  on  the  index  set,  i.e.  rt,,  =  for  any  isometry. 

It  is  this  type  of  notion  that  we  seek  to  generalize  to  the  dyadic  tree.  However, 
the  tree  T  has  many  isometries.  For  example  consider  an  isometry  pivoting  on  the 
node  denoted  “s  A  t”  in  Figure  4,  where  all  nodes  below  and  to  the  right  of  this  point 
are  left  unchanged  but  the  upper  left-hand  portion  of  the  tree  is  “flipped”  in  that 
the  two  branches  extending  from  s  /\  t  arc  interchanged  (so  that,  for  example,  u  is 
mapped  into  s).  Obviously  we  can  do  the  same  thing  pivoting  at  any  node.  We  refer 
the  reader  to  [14]  for  complete  treatments  of  the  nature  and  structure  of  isometries. 

The  preceding  discussion  suggests  a  first  notion  of  shift-invariance  for  a  stochastic 
process  yt  which  we  refer  to  as  isotropy.  Specifically  yt  is  an  isotropic  process  if  its 
statistics  remain  invariant  under  any  isometry  on  the  index  set.  As  shown  in  [3,  6,7,8] 
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yt  is  isotropic  if  any  only  if  its  covariance  as  defined  in  (2.1)  (with  t,  seT),  satisfies 
(2.2).  Thus,  as  with  a  standard  temporally-stationary  process,  an  isotropic  process  on 
T  is  characterized  by  a  covariance  sequence  ro, ri,r2,...  and,  as  in  the  standard  case 
we  have  two  natural  questions:  (1)  when  does  such  a  sequence  of  numbers  correspond 
to  a  valid  covariance  sequence  for  a  process  on  T;  and  (2)  how  can  we  construct 
dynamic  models  for  the  construction  of  an  isotropic  process  corresponding  to  such  a 
valid  sequence.  A  first  form  of  the  answer  to  the  first  question  can  actually  be  stated 
a  bit  more  generally.  Specifically,  if  5  is  any  index  set,  and  if  {yt,t€S}  is  a  zero-mean 
process  defined  on  5  then  its  covariance  r,  t  must  satisfy  the  following  :  select  an 

arbitrary  finite  family  . /  in  5;  then  the  7x7  matrix  whose  (t,  j)-element  is 

must  be  non-negative  definite  since 


(2.3) 


This  property  of  r,  which  is  necessary  and  sufficient  for  it  to  be  the  covariance  of 
such  a  process,  will  be  referred  to  as  positive  definiteness  in  the  sequel.  For  general 
index  sets  it  is  not  possible  to  find  more  useful  criteria  or  characterizations  of  positive 
definiteness.  However  for  stationary  time  series,  i.e.  for  5  =  Z  and  Tj  ,  satisfying 
(2.2)  much  more  can  be  said.  In  particular  the  celebrated  Bochner  spectral  represen¬ 
tation  theorem  states  that  a  sequence  r„,n  =  0,1,...  is  the  covariance  function  of 
a  stationary  time  series  if  and  only  if  there  exists  a  nonnegative,  symmetric  spectral 
measure  5(du;)  so  that 


As  shown  in  [2,  3]  there  is  a  corresponding  generalized  Bochner  theorem  for  a 
sequence  r„  to  be  the  covariance  of  an  isotropic  process  on  T.  Note  that  we  can 
obviously  find  a  subset  of  T  isomorphic  to  Z  —  i.e.  a  sequence  of  nodes  extending 
infinitely  in  both  directions,  and  yt  restricted  to  such  a  set  is  essentially  a  temporally- 
stationary  process.  Thus  for  r„  to  be  a  valid  covariance  of  an  isotropic  process  on  T 
it  must  certainly  be  a  valid  covariance  for  a  temporally-stationary  process.  However 
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there  are  additional  constraints  for  isotropic  processes-  for  example  in  T  we  can 
find  three  nodes  which  are  all  a  distance  two  from  one  another  (e.g.  u,v,  and  s  A  t 
in  Figure  4),  and  this  implies  an  additional  constraint  on  r„.  The  impact  of  these 
additional  constraints  can  be  seen  in  the  Bochner  theorem  in  [2,  3]  and  also  in  the 
results  described  in  the  next  section. 

While  the  Bochner  theorem  is  a  powerful  characterization  result  for  time  series 
and  for  processes  on  trees,  it  does  not  provide  a  computational  procedure  for  test¬ 
ing  positive  definiteness  or  for  constructing  models  for  such  processes.  However  for 
time  series  we  do  have  such  a  method,  namely  the  Wold  representation  of  station¬ 
ary  processes  via  causal,  autoregressive  (AR)  models.  This  representation  and  the 
well-known  Levinson  algorithm  for  its  construction  not  only  provide  a  procedure  for 
testing  positive-definiteness  but  also  for  constructing  rational,  finite-order  models  for 
stationary  processes.  The  subject  of  Section  3  of  this  paper  is  the  extension  of  this 
methodology  to  isotropic  processes  on  trees.  An  important  point  in  doing  this  is  to 
realize  that  such  a  construction  for  time  series  produces  a  model  that  treats  time 
asymmetrically  (by  imposing  causality)  in  order  to  represent  a  process  whose  statis¬ 
tics  do  not  have  inherent  temporal  asymmetry.  This  is  not  a  point  that  is  typically 
highlighted  since  the  geometry  of  Z  is  so  simple.  However  the  situation  for  T  is  decid¬ 
edly  more  complex,  and  to  carry  out  our  program  we  need  the  following  development 
which  in  essence  relates  the  pictorial  representations  of  Figures  1  and  4  and  provides 
the  basis  for  defining  causal  systems  in  scale. 

An  important  concept  associated  with  any  homogeneous  tree  is  the  notion  of  a 
boundary  point  [2,  3,  6,  14,  15]  of  a  tree.  Consider  the  set  of  infinite  sequences  of  nodes 
on  such  a  tree,  where  any  such  sequence  consists  of  a  set  of  distinct  nodes  fi,t2,... 
where  =  1.  A  boundary  point  is  an  equivalence  class  of  such  sequences 

where  two  sequences  are  equivalent  if  they  differ  by  a  finite  number  of  nodes.  For  the 
case  of  Z  there  are  two  boundary  points  corresponding  to  paths  toward  ±oo.  For  T 
there  are  many.  Let  us  choose  one  boundary  point  in  T  which  we  denote  by  — oo. 
Note  that  from  any  node  t  there  is  a  unique  path  in  the  equivalence  class  defined  by 
-oo  (i.e.  a  unique  path  from  t  “towards”  -oo  -  see  Figure  4).  Then  if  we  take  any 
two  nodes  s  and  t,  their  paths  to  — oo  must  differ  only  by  a  finite  number  of  points 
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and  thus  must  meet  at  some  node  which  we  denote  hy  s  At  (see  Figure  4).  Thus,  we 
can  define  a  notion  of  relative  distance  of  two  nodes  to  —  oo: 

6{3,t)  =  d{s,s  At)  —  d(t,s  At)  (2.4) 

so  that 

3  ■<  t  (“s  is  at  least  as  close  to  — oo  as  t”)  if  S{3,t)  <  0  (2.5) 

3  t  (“s  is  closer  to  — oo  than  t”)  if  6(3,  t)  <  0  (2.6) 

This  also  yields  an  equivalence  relation  on  nodes  of  T : 

3  -X  t  *-*  S{s,t)  =  0  (2.7) 

For  example,  the  points  a,  u,  and  «  in  Figure  4  are  all  equivalent.  The  equivalence 
classes  of  such  nodes  are  referred  to  as  horocycles.  These  equivalence  classes  are 
best  visualized  as  in  Figure  1  by  redrawing  the  tree,  in  essence  by  picking  the  tree 
up  at  —00  and  letting  the  tree  “hang”  from  this  boundary  point.  In  this  case  the 
horocycles  appear  as  points  on  the  same  horizontal  level  and  s  ■<  t  means  that  s 

lies  on  a  horizontal  level  above  or  at  the  level  of  t.  Note  that  in  this  way  we  make 

explicit  the  dyadic  structure  of  the  tree  as  depicted  in  Figure  1  and  provide  the  basis 
for  defining  multiscale  dynamic  models. 

In  order  to  define  dynamics  on  trees,  let  us  again  step  back  to  take  a  more  careful 
look  at  the  usual  formalism  that  is  used  for  time  series.  Specifically,  in  specifying  a 
temporal  system  in  terms  of  a  difference  equation  we  make  essential  use  of  the  notion 
of  shifts  or  moves  -  e.g.  in  an  AR  model  we  relate  yt  to  yf_i,  j/t-j,  etc.  where  the 
backward  shift  :  t  1 — »  t  —  1  obviously  plays  an  essential  role  in  expressing  the 
“local”  dynamics,  i.e.  the  relationship  of  a  signal  at  a  particular  point  to  its  values 
at  nearby  points.  Moreover,  thanks  to  the  simple  structure  of  Z,  we  have  the  luxury 
of  using  the  symbol  for  two  additional  purposes.  In  particular,  the  backward 
shift  is  an  isometry  and  in  fact  it  and  its  inverse,  the  forward  shift,  generate  all 
translations.  Furthermore, we  also  use  the  symbol  z~'  and  its  positive  and  negative 
powers  to  code  signals —  i.e.  we  represent  the  signal  yi  by  its  z-transform  -  and  all 
of  these  properties  provides  us  with  the  powerful  transform  domain  formalism  for 
analyzing  stationary,  i.e.  translation-invariant  systems. 
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The  situation  is  decidedly  more  complex  on  T.  To  see  this  let  us  begin  by  defining 
moves  on  T  that  will  be  needed  to  provide  a  “calculus”  for  stochastic  processes,  i.e.  for 
specifying  local  dynamics.  Such  moves  are  illustrated  in  Figure  1  and  are  introduced 
next  : 

•  0  the  identity  operator  (no  move) 

•  y  the  backward  shift  (move  one  step  toward  —  oo) 

•  o  the  left  forward  shift  (move  one  step  away  from  —  oo  toward  the  left) 

•  (3  the  right  forward  shift  (move  one  step  away  from  — oo  toward  the  right) 

•  6  the  interchange  operator  (move  to  the  nearest  point  in  the  same  horocycle) 

Note  that  the  richer  structure  of  T  requires  a  richer  collection  of  moves.  Also,  unlike 
its  counterpart  z“',  the  backward  shift  y  is  not  an  isometry  (it  is  onto  but  not  one- 
to-one),  and  it  has  two  forward  shift  counterparts,  a  and  /3,  which  are  one-to-one 
but  not  onto.  Also,  while  these  shifts  allow  us  to  move  up  and  down  in  scale,  (i.e. 
from  one  horocycle  to  the  next),  it  is  necessary  to  introduce  another  operator, 
in  order  to  define  purely  translational  shifts  at  a  given  level.  Note  also  that  0  and 
6  are  isometries  and  that  these  operators  satisfy  the  following  relations  (where  the 
convention  is  that  the  left-most  operator  is  applied  first)®: 


07  =  /?y  =  0  (2.8) 

^y  =  y  (2.9) 

=  0  (2.10) 

l36  =  a  (2.11) 


Arbitrary  moves  on  the  tree  can  then  be  encoded  via  finite  strings  or  ufords  using 
these  symbols  as  the  alphabet  and  the  formulas  (2.8)-(2.11).  Specifically  define  the 
language 

£  =  (y)*U(7)*6{a,^}*U{Q,/?}* 

®Our  convention  will  be  to  write  operators  on  the  right,  e.g. 


(2.12) 
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where  K*  denotes  arbitrary  sequences  of  symbols  in  K  including  the  empty  sequence 
which  we  identify  with  the  operator  0.  Then  any  move  on  T  is  uniquely  represented 
by  a  word  of  this  language.  It  is  straightforward  to  define  a  length  |u;|  for  each  word 
in  C,  corresponding  to  the  number  of  shifts  required  in  the  move  specified  by  w.  Note 
that 

111  =  |a  I  =  |/?|  =  1 

|0|  =  0  ,  \6\  =  2  (2.13) 

Thus  It" I  =  n,  =  the  number  of  q’s  and  0's  in  Wag  6  {a,/3}*,  and  |T"^u;o^|  = 

n  +  2  +  This  notion  of  length  will  be  useful  in  defining  the  order  of  dynamic 

models  on  T.  We  will  also  be  interested  exclusively  in  causal  models,  i.e.  in  models  in 
which  the  output  at  some  scale  (horocycle)  does  not  depend  on  finer  scales.  For  this 
reason  we  are  most  interested  in  moves  that  either  involve  pure  ascents  on  the  tree, 
i.e.  all  elements  of  {t}*,  or  elements  of  {^}‘6{a,0}*  in  which  the  descent  is 

no  longer  than  the  ascent,  i.e.  |u»a/3l  <  We  use  the  notation  lo  0  to  indicate  that 
w  is  such  a  causal  move.  Note  that  we  include  moves  in  this  causal  set  that  are  not 
strictly  causal  in  that  they  shift  a  node  to  another  on  the  same  horocycle.  We  use 
the  notation  u;  x  0  for  such  a  move.  The  reasons  for  this  will  become  clear  when  we 
examine  autoregressive  models. 

Also,  on  occasion  we  will  find  it  useful  to  use  a  simplified  notation  for  particular 
moves.  Specifically,  we  define  6^"*  recursively,  starting  with  =  S  and 

If  t  =  tja,  then 

If  t  =  <7^,  then  (2.14) 

What  6^"'  does  is  to  map  t  to  another  point  on  the  same  horocycle  in  the  following 
manner:  we  move  up  the  tree  n  steps  and  then  descend  n  steps;  the  first  step  in  the 
descent  is  the  opposite  of  the  one  taken  on  the  ascent,  while  the  remaining  steps  are 
the  same.  That  is  if  t  =  then  =  <7"“’ (5u>q/3.  For  example,  referring  to 

Figure  1,5  = 

The  preceding  development  provides  us  with  the  move  structure  required  for  the 
specification  of  local  dynamics  on  trees.  Let  us  turn  next  to  the  specification  of  “shift- 
invariant”  systems  and  processes.  The  most  general  linear  input/output  relationship 
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for  signals  defined  on  the  tree  is  simply 

(2.15) 

s&r 

As  with  temporal  systems,  one  would  expect  the  requirements  of  various  notions 
of  shift-invariance  to  impose  constraints  on  the  weighting  coefficients  ht,,.  To  see  this 
let  us  first  adopt  an  abuse  of  notation  commonly  used  for  time  series.  Specifically,  if 
r  is  an  isometry  of  T,  we  use  the  same  notation  to  denote  an  operation  on  signals 
over  T,  i.e. 

^{y)«  =  yr(t)  (2.16) 

(analogous  to  z~^yt  =  J/t-i)-  A  first,  rather  strong  notion  of  shift-invariance  might 
be  that  if  t(u)  is  applied  to  the  system  for  any  isometry  t,  then  the  output  is  T{y), 
where  y  is  the  response  to  u.  It  is  not  difficult  to  check  that  for  this  to  be  the  case 
we  must  have  that 

ht,,  =  h{d{s,t))  (2.17) 

Note,  however,  that  this  is  an  exceedingly  strong  condition  and  indeed  generalizes 
the  notion  of  zero-phase  LTI  systems,  i.e.  systems  with  impulse  responses  such  that 
h{t,s)  =  /i(|t  —  s|).  Such  systems  obviously  are  not  causal,  and  in  fact  are  far  too 
constrained  in  that  they  require  invariance  to  too  many  isometries.  In  particular 
such  an  LTI  system  has  the  property  that  it  is  not  only  translation-invariant  but  also 
reversal  invariant  (i.e.  u{—t)  yields  y{—t)).  In  the  case  of  time  series  we  overcome  this 
by  using  the  smaller  group  of  isometries  generated  by  the  shift  z~^.  On  T,  however, 
the  shifts  7,  a,  and  /?  are  not  isometries.  For  this  reason  it  is  necessary  to  introduce 
a  subgroup  of  isometries  of  T  corresponding  to  the  other  role  played  by  that  of 
defining  backward,  causal,  translations. 

Specifically,  let  (t„)„6Z  denote  an  infinite  path  extending  in  T  back  toward  —00 
(as  n  — »  —00).  A  (one  step)  translation  with  skeleton  (t„)  is  an  isometry  of  T  that 
has  the  property  that 

r(<„)  =  t„+i  (2.18) 

Since  there  are  many  such  paths  (t„)  there  obviously  are  many  translations,  and 
indeed  for  any  particular  (t„)  there  are  numerous  translations  (see  Figure  5).  Never- 
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theless  the  class  of  translations  represents  a  proper  subset  of  all  isometries,  and  does 
allow  us  to  define  a  very  useful  notion  of  shift  invariance: 

Definition  1  (stationary  systems)  A  linear  system  H  as  in  (2.15),  acting  on  sig¬ 
nals  on  T,  is  said  to  be  a  stationary  system  if* 

H  o  T  —  T  o  H  (2.19) 

for  any  translation  t. 

A  fundamental  result  proven  in  [9]  is  that  H  is  stationary  if  and  only  if  its  weighting 
pattern  satisfies. 

=  h[d{t,s /\t),d{s,s  l\t)\  (2.20) 

Thus  a  stationary  system  is  specified  by  a  2-D  sequence  h(n,m),n,m  >  0  and, 
referring  to  Figure  1,  we  see  that  (2.20)  has  an  intuitively  appealing  interpretation. 
Specifically  ^  A  f  denotes  the  ’’parent”  node  of  s  and  t,  i.e.  the  finest  scale  node 
that  has  both  s  and  t  as  descendants,  and  (2.20)  states  that  ht,,  depends  only  on  the 
distances  in  scale  from  this  parent  node  to  s  and  to  t.  Roughly  speaking  the  influence 
of  the  input  at  node  s  on  the  output  at  node  t  in  a  stationary  system  depends  on 
the  differences  in  scale  and  in  temporal  offset  of  the  scale/shift  pairs  represented  by 
t  and  s. 

Obviously,  a  system  satisfying  (2.17)  (and  thus  corresponding  to  a  system  that 
commutes  with  ^  isometries)  also  satisfies  (2.20)  (this  is  easily  seen  since  d{s,t)  = 
d(s,  s  A  f )  +  d{t,  s  A  f )).  The  reverse  is  certainly  not  true  indicating  that  we  have  a  far 
larger  class  of  stationary  systems  as  defined  in  Definition  1.  Similarly,  we  can  define 
a  larger  class  of  shift-invariant  processes; 

Definition  2  (stationary  stochastic  processes)  A  zero  mean  (scalar)  stochastic 
process  y  is  said  to  be  stationary  if  Us  covariance  function  is  translation-invariant, 
i.e. 

r.,e  =  T-rCWt)  (2-21) 

for  any  translation  r. 

denotes  the  composition  of  maps. 
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As  shown  in  [9]  a  process  is  stationary  if  and  only  if 

=  r[d{s,  a  A  t),  d{t,  s  A  <)]  (2.22) 

Thus  a  stationary  process  is  specifies  by  a  2-D  sequence  r(n,m),n,Tn  >  0.  Also 
isotropic  processes —  i.e.  processes  for  which  (2.21)  is  satisfied  for  all  isometries  and 
for  which  (2.2)  holds-  are  obviously  stationary,  but  the  reverse  implication  is  not 
true,  so  that  stationary  processes  represent  a  richer  class  of  processes.  Furthermore 
the  covariance  structure  (2.22)  in  essence  says  that  the  statistical  relationship  between 
the  values  of  a  stationary  process  at  two  nodes  depends  on  the  differences  in  scale 
and  in  temporal  offset  of  the  two  nodes.  In  particular  from  (2.22)  it  follows  that 
the  statistical  behavior  of  the  restriction  of  a  stationary  process  to  any  scale  (i.e. 
horocycle)  does  not  depend  on  the  scale,  indicating  that  the  concept  of  stationarity 
on  the  tree  appears  to  be  a  natural  and  convenient  one  for  capturing  a  notion  of 
statistical  self-similarity.  Moreover,  as  we  will  see,  the  Haar  transform  yields  the 
eigenstructure  of  the  process  at  any  scale,  providing  another  tie  back  to  wavelet 
transforms.  In  Section  4,  we  expand  on  these  and  related  points  in  the  context  of 
the  investigation  of  a  class  of  finite-dimensional  state  models  on  dyadic  trees  that, 
in  the  constant-coefficient  case,  provides  us  with  the  class  of  rational  linear  systems 
satisfying  the  notion  of  stationarity  we  have  introduced. 

Let  us  close  this  discussion  with  a  few  comments.  First,  as  shown  in  [9],  the  notions 
of  systems  and  stochastic  stationarity  introduced  in  Definitions  1  and  2  are  compatible 
in  the  sense  that  the  output  of  a  stationary  system  driven  by  a  stationary  input  is  itself 
stationary.  In  general,  however,  an  isotropic  process  driving  an  arbitrary  stationary 
system  does  not  yield  an  isotropic  output,  and  thus  we  might  expect  that  we  will  have 
to  work  harder  to  pinpoint  the  class  of  systems  that  does  generate  isotropic  processes. 
Furthermore,  as  we  have  indicated  we  are  interested  in  constructing  causal  models, 
i.e.  systems  as  in  (2.15)  with 

/i{ ,  =  0  for  <  -<  a  (2.23) 

For  stationary  systems  this  corresponds  to  requiring 

h{d{t,  s  A  t),d{a,s  A  t))  =  0  for  d(t,  s  A  t)  <  d{a,  a  A  t)  (2.24) 
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Finally,  let  us  make  a  brief  comment  about  the  generalization  of  the  third  use 
of  z~^,  namely  to  define  transforms.  Specifically,  as  discussed  in  [6,  7,  8,  9],  natural 
objects  lo  consider  in  this  context  are  noncommutative  formal  power  series  of  the 
form: 

5  =  ^  •  u>  (2.25) 

w€C 

We  will  use  such  tranforms  in  the  next  section  in  order  to  encode  correlation 
functions  in  our  generabzation  of  the  Schur  recursions.  In  addition  transforms  of  this 
type  can  be  used  to  encode  convolutional  systems.  Specifically,  we  can  think  of  (2.25) 
as  defining  the  system  function  of  a  system  in  the  following  manner:  if  the  input  to 
this  system  is  Ut,i  G  T,  then  the  output  is  given  by  the  generalized  convolution: 

(Su)t  =  ^  s^Utw  (2.20) 

Note  that  in  this  context  causality  corresponds  to  Ju,  =  0  for  all  0  -<  w.  Also  it  is 
important  to  realize  that  while  (2.25),  (2.26)  would  seem  to  correspond  to  a  general 
class  of  shift-invariant  systems,  both  classes  of  systems  we  have  described-  stationary 
and  isotropic-  require  further  restrictions.  In  particular  for  S  in  (2.25),  (2.26)  to  be 
stationary  we  must  have  that  if  u;  =  then  Su  depends  only  on  n  and  \uJa0\- 

Similarly,  5  is  isotropic  if  depends  only  on  |w|.  Finally,  for  future  reference  we  use 
the  notation  5(0)  to  denote  the  coefficient  of  the  empty  word  in  5.  Also  it  will  be 
necessary  for  us  to  consider  particular  shifted  versions  of  5: 

7[-?]  =  (2.27) 

=  E  ^  (2.28) 

w^C 

where  we  use  (2.8)-(2.11)  and  (2.14)  to  write  107  and  as  elements  of  C.  Notice 

that,  because  of  the  relations  (2.8)-(2.11),  the  operators  5  — ►  7(5]  and  5  — ►  ^[5] 
can  not  be  thought  of  as  multiplication  operators  on  formal  power  series. 
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3  Isotropic  Processes  and  Multiscale  Autoregres¬ 
sive  Models 

In  this  section  we  investigate  how  multiscale  isotropic  processes  may  be  finitely 
parametrized  and  how  properties  of  processes  may  be  checked  on  their  associated 
parametrizations.  In  particular,  as  for  time  series  it  is  of  considerable  interest  to  de¬ 
velop  white-noise-driven  models  for  processes  on  trees  and,  more  specifically,  models 
that  are  in  some  sense  of  finite-order.  Also,  as  we  have  discussed  in  the  preceding 
section,  we  are  interested  in  developing  a  framework  for  constructing  models  that 
possess  a  causal  structure  in  scale.  Motivated  by  the  theory  of  AR  representations 
for  temporally-stationary  stochastic  processes,  we  focus  attention  here  on  the  class  of 
multiscale  AR  models,  where  the  pth-ordcr  version  of  such  a  model  has  the  form 

yt  —  y 

kf<p 

where  Wt  is  white  noise  (i.e.  it  is  uncorrelated  from  node  to  node)  with  unit  variance. 
The  form  of  (3.1)  deserves  some  comment.  A  first  question  that  arises  is:  why  not 
look  instead  at  models  in  which  yt  depends  only  on  its  “strict”  past,  i.e.  on  point 
of  the  form  <7".  As  shown  in  [6,  7,  8],  the  only  model  of  this  type  that  yields  an 
isotropic  output  is  the  first-order  version  of  (3.1),  i.e. 

yt  =  ayty  +  ^Wt  (3.2) 

Indeed  higher-order  versions  of  such  a  model  yield  stationary  processes  in  the  sense 
of  Definition  2.2  and  as  considered  in  the  next  section.  Secondly,  note  that  the 
constraints  placed  on  u;  in  the  summation  of  (3.1)  state  that  yt  is  a  linear  combination 
of  the  white  noise  Wt  and  the  values,  y(„,  at  nodes  that  are  both  at  distances  at  most 
p  from  t  (i.e.  |u;|  <  p)  and  also  on  the  same  or  previous  horocycles  {w  0).  Thus 
the  model  (3.1)  is  not  strictly  “causal”  and  is  indeed  ar  implicit  specification  since 
values  of  y  on  the  same  horocycle  depend  on  each  other  through  (3.1).  For  example, 
consider  the  AR(2)  process,  which  specializing  (31),  has  the  form 


yt  =  aij/fT  + 


(3.3) 
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Note  first  that  this  is  indeed  an  implicit  specification,  since  if  we  evaluate  (3.3)  at  tS 
rather  than  t  we  see  that 


yts  =  OiVt-  +  OiVef  +  03j/«  +  (rWts  (3.4) 

The  structure  of  (3.3),  (3.4)  reveals  that  for  a  second-order  model  we  need  to  consider 
simultaneously  the  coupled  propagation  of  pairs  of  values  yt,yts-  It  also  suggests  that 
perhaps  the  implicit  representation  of  (3.1)  is  not  the  most  ideal  one.  To  add  further 
credence  to  this,  note  that  the  second-order  AR(2)  model  has  four  coefficients — three 
fl’s  and  <T,  while  for  second-order  time  series  there  would  only  be  two  a’s.  Indeed 
this  disparity  grows  with  increasing  order  as  the  number  of  coefficients  au,  in  (3.1) 
grows  geometrically  with  p.  On  the  other  hand,  as  shown  in  [6]  the  constraints  of 
isotropy  place  nonlinear  and  rather  unwieldy  constraints  on  these  coefficients.  For 
these  reasons  there  is  strong  motivation  to  consider  an  alternate  representation  for 
isotropic  AR  models.  Again  it  is  useful  to  contrast  the  situation  on  T  with  that  on  Z. 
In  particular,  there  are  two  equally  useful  parametrizations  for  pth  order  AR  models 
for  stationary  time  series:  in  terms  of  the  p  lagged  coefficients  a„,l  <  n  <  p  or  in 
terms  of  the  p  reflection  or  partial  correlation  (PARCOR)  coefficients  k„,l  <  n  <  p 
used  in  lattice  filter  representation  of  AR  models.  For  time  series,  increasing  the 
order  by  one  increases  the  number  of  a’s  and  fc’s  by  one.  For  multiscale  AR  models, 
increasing  the  order  by  one  doubles  the  number  of  a’s,  although  these  are  subject  to 
a  (growing!)  number  of  nonlinear  constraints.  However,  as  we  will  see,  if  we  switch 
to  the  alternate  PARCOR  representation,  we  will  again  only  need  to  add  only  one 
new  coefficient  and  will  avoid  completely  the  need  for  nonlinear  constraints. 

To  begin,  recall  that  the  basic  idea  behind  the  Levinson  algorithm  for  the  construc¬ 
tion  of  AR  models  of  increasing  order  for  stationary  time  series  involves  the  considera¬ 
tion  of  both  forward  and  backward  predictions  of  the  series  based  on  increasing  inter¬ 
vals  of  data.  Specifically,  consider  an  ordinary  time  series  Xk  and  introduce  the  spaces 
A’fc.n  =  ‘H{x)[,,  ...,ifc_n}  where  'H{-  ■  •}  denotes  the  linear  span  of  the  random  variables 
indicated  between  the  braces.  Forward  and  backward  prediction  errors  or  “residu¬ 
als”  are  defined  as  efc,„  =  Xfc  -  and  fk,n  =  Xfc-n  - 
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respectively.  The  formulae 

Cfe.n+l  =  —  E{xk\Xk-l,n} 

=  Xh  —  E{xk\Xk-\,n-\} 

=  efc,„  —  E{xk\Xk-i,„  © 

—  ^k,n 

”  ^ie.n  ^nfk~\,n  (3.5) 


where  U  QV  denotes  the  orthogonal  complement  of  V  in  if,  show  that  the  key  to 
the  calculation  of  the  (ti  +  l)st-order  prediction  error  efc,„+i  is  the  computation  of  the 
prediction  of  the  forward  residual  Cfc,„  given  the  backward  one  Similarly,  the 

prediction  of  the  backward  residual  given  the  forward  one  is  needed  for  the  calculation 
of  backward  residuals  of  increasing  order.  It  is  a  remarkable  property  of  stationary 
time  series  that  both  prediction  operators  are  identical,  i.e.  that  the  same  coefficient 
kn  in  (3.5)  also  appears  in  the  corresponding  equation  for  the  backward  residual. 
This  fact,  which  then  leads  to  the  celebrated  Levinson  recursions,  stems  from  the 
fact  that  the  statistics  of  a  stationary  time  series  are  invariant  under  the  isometry 
k  I — *  —k.  The  correlation  coefficient  of  the  two  involved  residuals  is  also  known 
as  the  PARCOR  coefficient  of  ij,  and  Xfc_„  given  This  is  illustrated  in  the 

following  diagram  : 

•  o  o  o  o  o  • 

Since  efc,o  =  fk,o  =  ^k,  we  find  that  (3.5)  and  the  associated  Levinson  recursion 
provide  us  with  a  method  for  constructing  models  for  x„  of  increasing  order.  In 
particular,if  efc,„  and  /*,„  are  white,  (so  that  all  higher-order  PARCOR  coefficients 
are  0),  we  obtain  an  nth  order  AR  model  for  i„  constructed  in  lattice  form,  i.e.  one 
first-order  section  (specified  by  one  PARCOR  coefficient)  at  a  time. 

Let  us  now  consider  the  extension  of  these  ideas  to  the  dyadic  tree.  As  one  might 
expect  from  the  preceding  discussion  of  AR(2)  and  as  developed  in  detail  in  [6,  7,  8], 
construction  of  models  of  increasing  order  requires  the  consideration  of  vectors  of 
forward  and  backward  residuals  of  dimension  that  increases  with  model  order.  To 
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begin,  let  j/t  be  an  isotropic  process  on  a  tree,  and  define  the  (nth-order)  past  of  the 
node  t  on  T  : 

yt.n  =  'H  {j/tu,  :  0,  <  n}  (3.6) 

In  analogy  with  the  time  series  case,  the  backward  innovations  or  prediction  error 
space,  which  we  denote  by  Tt,nt  are  defined  as  the  variables  spanning  the  new  infor¬ 
mation  in  which  are  orthogonal  to 

3^(,n  =  ^ !,n  (3-7) 

SO  that  J’t.n  is  the  orth'.gonal  complement  of  in  (i.e.  Tt,n  =  for 

n  >  0,  while  =  3^t.o)-  A  basis  for  ^t,n.  can  be  obtained  by  defining  the  backward 
prediction  errors  for  the  “new”  elements  of  the  “past”  introduced  at  the  nth  step,  i.e. 
for  tu  0  and  |iu(  =  n,  define 

Fe.„(w)  =  ytu,  -  ^(j/tu-Ut.n-i)  (3.8) 

Then 

^t,n  =  "H  {Ft.n(«;)  :  ltu|  =  n,u>  X  0}  (3.9) 

Similarly  we  introduce  the  forward  innovations  or  prediction  error  space,  which 
we  denote  by  £■{.„.  For  n  =  0,  £t.o  =  Tl  {j/t},  while  for  n  >  0 

Et.n  =  (J^t,n-1  +  3^ty,n-l)  ©  (3.10) 

Note  that  -f  is  used  here  instead  of  >^t,n  ;  while  both  spaces  are  equal 

in  the  case  of  ordinary  time  series  (in  which  7  is  replaced  by  r~*),  they  differ  here®. 
To  obtain  a  basis  for  we  define  the  forward  innovations 

=  ytiv  -  E  {ytu,\yt-,n-\)  (3.11) 

where  w  ranges  over  a  set  of  words  such  that  tw  is  on  the  same  horocycle  as  t  and  at 
a  distance  at  most  n  -  1  from  t  (so  that  is  the  past  of  that  point  as  well),  i.e. 

|ui|  <  n  and  ty  0.  Then 

Et.n  =  H{EtA^')  '  !«’|  <  n  and  uj  x  0}  (3.12) 

*For  example  >^,,5  consists  of  y«,  J/it- Vtt-  However,  consists  of  j/(  and  while 
consists  of  y,y  and 
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Let  Et,n  and  Ft.„  denote  column  vectors  of  the  elements  Ei,n{w)  and  Ft,„(u;),  re¬ 
spectively.  As  n  increases  the  dimensions  of  these  residual  vectors  grow  geometrically. 
Levinson  recursions  for  isotropic  processes  involve  the  recursive  computation  of  F(,n 
and  Et,n  as  n  increases.  Since  Ft,o  and  Ft.o  both  equal  j/t>  these  recursions  yield  lat¬ 
tice  structures  for  AR  models  of  increasing  order.  As  developed  in  [6]  and  as  the 
reader  may  guess  from  the  results  for  time  series,  the  key  to  these  recursions  are  all 
PARCOR  coefficients  involving  an  arbitrary  pair  {  □ ,  <0}  given  the  space  spanned  by 
the  O  Figure  6.  Furthermore,  it  can  be  verified  that  suitable  combinations  of  the 
elementary  isometries  shown  in  this  figure  provide  isometries 

•  leaving  the  space  yt-.a  (circles)  globally  invariant 

•  exchanging  two  arbitrary  □ ’s  or  the  two  <C>- 

From  this  it  follows  that  all  pairs  {□,  O’}  possess  the  same  PARCOR  coefficients 
given  the  space  spanned  by  the  circles.  Hence,  as  for  time  series,  we  can  show  in 
general  that  a  single  PARCOR  or  reflection  coefficient  is  involved  in  each  stage  of  the 
Levinson  recursions.  Similar  uses  of  the  symmetries  of  the  tree  and  the  correlation 
structure  of  isotropic  processes  allows  us  to  show  that  only  the  barycenters  of  the 
forward  and  backward  prediction  error  vectors  are  needed  to  compute  these  reflection 
coefficients.  These  barycenters  are  defined  as  follows  : 

=  2-f^l  ^  Et,n{w) 

|w|<n»u;5C0 

=  2-i?l  Y,  Fay>) 

ju»j5rn,u;^0 

In  particular  in  [6]  the  following  results  are  proven  providing  a  generalization  of 
the  Levinson  recursions  to  the  barycentric  prediction  errors  for  isotropic  processes  on 

T  : 

Theorem  1  (barycentric  Levinson  recursions)  For  n  even: 

Cj.n  ~  1  (3.13) 

/«,n  =  2  ~  ^n^t, n-1  (^•I‘^) 
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where 

kfi  —  — 1  > 1  ) 

(3.15) 

and  cor  (x,y)  = 

E{xy)/[E{x^)E(y^)]^'\ 

For  n  odd: 

et.n  =  2 

(3.16) 

ft.n  =  fty,n-l  ~  2^n 

(3.17) 

where 

kn  =  COrQ  (^et,n.l  +  ,/tT.n-l) 

(3.18) 

Corollary;  The 

For  n  even 

variances  of  the  baryceniers  satisfy  the  following  recursions. 

<n  =  (e?.„)  =  (l  -  fc*) 

(3.19) 

(3.20) 

where  kn  must  satisfy 

(3.21) 

For  n  odd 

<n  =  =  (l  -  K) 

(3.22) 

where 

VI 

c 

VI 

1 

(3.23) 

As  we  had  indicated  previously,  the  constraint  of  isotropy  represents  a  significantly 
more  severe  constraint  on  the  covariance  sequence  r(n)  of  an  isotropic  process  than 
on  that  for  a  stationary  time  series.  It  is  interesting  to  note  that  these  additional 
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constraints  appear  in  the  preceding  development  only  in  the  form  of  the  simple  mod¬ 
ification  (3.21)  of  the  constraint  on  for  n  even  over  the  form  (3.23)  that  one  also 
finds  in  the  corresponding  theory  for  time  series.  Also,  as  with  the  usual  Levinson 
recursions  for  time  series  we  can  use  the  formulae  in  Theorem  1  and  its  corollary 
to  obtain  explicit  recursions  for  the  computation  of  the  fc„  sequence  directly  from 
the  given  covariance  data,  r(n).  These  recursions  also  contain  some  differences  from 
the  usual  results  reflecting  the  constraints  of  isotropy  on  the  tree.  Rather  than  dis¬ 
playing  these  we  describe  here  an  alternative  computational  procedure  generalizing 
the  so-called  Schur  recursions  [30,  43]  for  the  cross-spectral  densities  between  a  given 
time  series  and  its  forward  and  backward  prediction  errors.  In  considering  the  gen¬ 
eralization  of  these  recursions  to  isotropic  processes  on  trees,  we  must  replace  the 
2-transform  power  series  for  cross-spectral  densities  by  corresponding  formal  power 
series  of  the  type  introduced  in  Section  2.  Specifically  for  n  >  0  define  P„  and  Q„  as: 

P„  =  cov(yt,et,„)  ^  £;(j/,ct,„,„)  •  tu  (3.24) 

ui:<o 

Qn  =  COV  {yt,  ft,n)  =  EiVtftw.rx)  -W  (3.25) 

where  we  begin  with  Pq  and  Qq  specified  in  terms  of  the  correlation  function  r„  of  yt'. 

Po  =  <?o  =  XI  ^  (3.26) 

Recalling  the  definitions  (2.27),  (2.28)  of  7(5]  and  ^(*'^[5]  for  5  a  formal  power  series 
and  letting  5(0)  denote  the  coefficient  of  w  =  0,  we  have  the  following  generalization 
of  the  Schur  recursions,  proven  in  [6]: 

Theorem  2  (Schur  recursions)  The  following  Schur  recursions  on  formal  power 
series  yield  the  sequence  of  reflection  coefficients. 

For  n  even 


Pn-l  -  fcn7(Qn-l] 

(3.27) 

(3.28) 
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where 


T[gn-^](0)  +  ^<^>[Pn-l](0) 

2Fn-l(0) 


(3.29) 


For  n  odd 

Pn  =  (3.30) 

Qn  =  TlQn-l]  -  fc„^  (Pn-1  +  (3.31) 

where 

k  2Tl<?— 1(0) 

"  P„-,(0)  +  <l=f^)|P„.,l(0) 

Theorems  1  and  2  provide  us  with  the  right  way  in  which  to  parametrize  isotropic 
processes.  Furthermore,  as  developed  in  [6,  7,  8],  we  can  build  on  these  results  to 
provide  a  complete  generalization  of  the  Wold  decomposition  of  an  isotropic  process. 
In  particular,  lattice  structures  can  be  constructed  for  whitening  filters,  i.e.  for  the 
computation  of  the  prediction  error  vectors  Et,n  '-nd  Fj,„  as  outputs  when  j/t  is  taken 
as  input.  Similarly  lattice  forms  are  derived  in  [7]  for  modeling  filters,  i.e.  systems 
whose  output  is  the  isotropic  process  when  the  input  is  the  corresponding-order  pre¬ 
diction  error.  Figure  2  illustrates  the  output,  along  one  horocycle  of  a  third-order 
modeling  filter  (i.e.  an  AR(3)-model)  driven  by  a  white  Et^z  process.  We  note  that  a 
major  difference  between  these  lattice  structures  and  the  usual  ones  for  time  series  is 
that  they  involve  lattice  blocks  of  growing  dimension,  capturing  the  coupling  along  a 
horocycle  for  AR  processes  of  higher  order.  Also,  as  with  time  senes,  statistical  prop¬ 
erties  of  isotropic  processes  may  be  checked  using  the  parametrization  via  reflection 
coefficients.  The  m-'in  results  are  now  listed  and  we  again  refer  the  reader  to  [7,  8] 
for  more  precise  formulations  of  these  results  and  their  proofs. 


(3.32) 


Theorem  3  (checking  properties  via  reflection  coefficients) 


1.  Characterization  of  AR  processes  :  an  isotropic  process  is  AR(n)  if  and 
only  if  its  reflection  coefficients  of  order  >  n  are  all  zero. 
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2.  Schur  criterion  :  if  the  sequence  (r„)  is  the  covariance  function  of  an  isotropic 
process,  then  the  Schur  recursions  must  yield  reflection  coefficients  satisfying  the 
inequalities 

-  1  <  <  +1  ,  <  fczn  <  +1  (3.33) 

3.  Parametrizing  AR  processes  :  conversely,  a  finite  family  of  coefficients 
satisfying  the  above  strict  inequalities  (3.33)  defines  a  unique  isotropic  AR 
process. 

4.  Regular  and  singular  processes  :  If  the  sequence  (r„)  satisfies  the  strict 
inequalities  (3.33)  and  furthermore  the  condition 

00 

H  ^In+l  +  <  00 

n=l 

holds  true,  then  it  is  the  reflection  coefficient  sequence  of  a  regular  (i.e.  purely 
nondeterministic)  isotropic  process. 

The  first  three  of  these  results  represent  easily  understood  generalizations  of  results 
for  time  series.  For  example  they  imply  that  the  nth  and  higher-order  prediction  error 
vectors  of  an  AR(n)  process  are  white  noise  processes.  The  fourth  statement  concerns 
itself  with  the  issue  of  whether  or  not  the  value  of  yt  can  be  perfectly  prediction  based 
on  data  in  its  (infinite)  past.  Specifically,  an  isotropic  process  yt  is  regular  or  purely 
nondeterministic  if 


>  0  (3.34) 

holes,  where 

<7^  =  inf  II  |3^fY-i.ocj  ir  (3.35) 

and  the  infimum  ranges  over  all  collections  of  scalars  {pv>)wv.o  where  only  finitely  many 
of  the  /tu,  are  nonzero  and  the  condition  XI f  's  satisfied.  In  other  words,  no 
nonzero  linear  combination  of  the  values  of  yt  on  any  given  horocycle  can  be  predicted 
exactly  with  the  aid  of  knowledge  of  Y  in  the  strict  past,  3'’t-Y-i,oo  and  the  associated 
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predicticn  error  is  uniformly  bounded  from  below.  It  is  interesting  to  note  that  the 
condition  for  regularity  for  isotropic  processes  involves  the  absolute  sum  rather  than 
sum  of  squares  of  the  even  reflection  coefficients  and  thus  is  a  stronger  condition.  This 
implies  that  there  is  apparently  a  far  richer  class  of  singular  processes  on  T  than  on 
Z.  This  appears  to  be  related  to  the  complications  arising  in  the  Bochner  theorem  for 
isotropic  processes  on  T  and  to  the  large  size  of  its  boundary.  We  refer  the  reader  to 
[6,  7,  8]  for  further  discussions  of  these  and  other  points  related  to  isotropic  processes 
and  their  AR  representations. 
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4  System  Theory  and  Estimation  for  Stationary 
Processes  and  State  Models 

In  this  section  we  describe  some  of  the  basic  concepts  associated  with  the  analysis  of 
stationary  systems  and  processes  on  the  dyadic  tree.  To  begin,  let  us  introduce  the 
following  basic  systems  on  T  : 

=  ^(«ta  +  Wt0)  (4.1) 

(7.u)f  =  (4.2) 

It  is  not  difficult  to  check  that  each  of  these  systems  is  stationary.  The  system  y 
can  be  naturally  thought  of  as  a  “backward”  shift  towards  — oo,  corresponding  to  the 
coarse-to-fine  interpolation  operation  in  the  fine- to- coarse  Haar  transform,  whereas  7 
is  a  “forward-and-average”  shift  corresponding  to  the  “Haar  smoother”.  Using  these 
operators,  it  is  not  difficult  to  show  that  a  stationary  system  can  be  represented  as 

t.j>0 

Such  a  system  is  causal  if  and  only  if  Sjj  is  nonzero  only  over  the  set  {(1,  j) 
i.e.  only  past  inputs  can  influence  the  considered  output. 

The  representation  in  (4.3)  is  one  of  two  extremely  useful  transform-like  repre¬ 
sentations  of  stationary  systems.  This  one  is,  in  particular,  of  use  in  providing  a 
generalization  of  time  series  results  on  the  effect  of  linear  systems  on  power  spectra 
and  cross-spectra.  Specifically,  consider  two  jointly  stationary  processes  x  and  y,  with 
covariance  function 

E{x,yt)  =  r**'[d(s,  s  At),  d{t,  3  A  f)]  (4.4) 

Let  us  define  the  cross-spectrum  of  x  and  y  as  the  following  power  series; 

fi'-  I  Y.  ’••■'li.jl  f  V 

<.j>0 

Also,  given  a  stationary  transfer  function  as  in  (4.3),  we  introduce  the  following  notion 
of  an  “adjoint”  : 


H-  =  Y 


(4.5) 
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Then  as  shown  in  [9],  if  H  and  K  are  stationary  transfer  functions,  the  processes  Hx 
and  Ky  are  also  jointly  stationary®,  and  we  have  the  following  generalization  of  a 
well-known  result  ; 

^  (4.6) 

Let  us  now  turn  to  the  question  of  internal,  “state”  realizations  of  stationary 

systems.  In  this  case  an  alternate  representation  to  (4.3)  is  also  of  value.  To  define 
this  we  introduce  the  following  family  of  operators  which  perform  a  smoothing  of  data 
on  the  same  horocycle: 

<rl‘l  =  yy  (4 

This  operator  provides  an  average  of  the  values  of  a  signal  at  the  2*  nearest  points 
on  the  same  horocycle.  For  example,  {(r.u)t  =  l/0(ut  -f  Utt)  where  tr  =  <7!*!  and 
(<Tt^l.w)t  =  ~{ut  -i-  Uts  -f  -h  Note  also  that  each  o-W  is  an  idempotent 

operator.  As  shown  in  [9]  operators  may  be  used  to  encode  any  stationary  causal 
system  via  a  representation  of  the  form  : 

H  =  E  '“M  ’f"'’''  (4-8) 

t,j>0 

In  order  to  develop  a  realization  theory  for  stationary  systems,  let  us  note  that 
both  formulae  (4.3)  and  (4.8)  are  strikingly  similar  to  the  forms  of  system  functions 
studied  in  standard  2-D  system  theory.  While  there  are  obvious  differences  -  e.g. 
we  have  the  relation  77  =  1  between  the  two  variables  in  (4.3)  and  the  symbol 
o-t*!  is  not  simply  interpretable  as  the  square  of  a-  it  is  indeed  possible  to  build  on 
standard  2-D  realization  theories.  Note  in  particular  that  even  though  (4.3)  includes 
noncausal  multiscale  systems,  it  has  the  appearance  of  a  2-D  quadrant-causaJ  system, 
as  does  (4.8)  since  the  summations  are  restricted  to  i,  j  >  0.  Let  us  begin  with,  (4.3). 
Building  on  the  2-D  analogy,  if  we  interpret  7  as  the  row  operator  and  7  as  the 
column  generator,  then  it  is  natural  to  consider  row-by-row  scanning  to  define  a 
total  ordering  on  the  2D  index  space.  This  corresponds  to  decomposing  the  transfer 
function  H  according  to  the  following  two  steps: 

•This  of  course,  is  true  only  if  Hx  and  Ky  are  well-defined,  i.e.  if  they  are  finite-variance 
processes.  As  one  might  expect,  this  requires  some  notion  of  stability  for  the  systems.  We  return  to 
this  point  later  in  this  section  in  the  context  of  state  models. 
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1.  a  bottom-up  (i.e.  fine-to-coarse)  smoothing,  followed  by 

2.  a  top-down  (i.e.  coarse-to-fine)  propagation. 

2D-system  theory  for  systems  having  separable  denominator  [4,  32]  may  be  applied 
here.  Rational  transfer  functions  in  this  latter  case  are  of  the  following  form: 

H  =  C{1  -  P{I-  B  (4.9) 

which  yields  the  following  state  space  form 

vt  =  + 

=  PlVt 

'  —  A-^Xi  P\Zif,  (4.10) 

Xi0  =  A'^Xt  +  PiZt0 

.  Vt  =  Cxt 

where  P  =  PiPj-  The  first  two  equations  define  a  purely  “anticausal”  process, 
whereas  the  last  three  equations  define  a  causal  zero  depth  process.  Later  in  this 
section  we  describe  an  optimal  multiscale  estimation  algorithm  that  has  precisely 
this  structure. 

Now  let  us  turn  to  the  representation  of  multiscale  causal  systems  in  (4.8).  Here 
we  interpret  the  sequence  <t1‘1  as  the  powers  of  the  row  operator  and  7  as  the  column 
operator.  Then  again  we  consider  row-by-row  scanning  to  define  a  total  ordering 
of  the  2D  index  space.  This  corresponds  to  decomposing  the  transfer  function  H 
according  to  the  following  two  steps: 

1.  a  smoothing  along  the  considered  horocycle  (i.e.  constant  scale  smoothing), 
followed  by 

2.  a  top-down  (i.e.  coarse-to-fine)  propagation. 

2P-system  theory  for  systems  having  separable  denominator  [4,  32]  may  again  be 
applied  here.  Rational  transfer  functions  in  this  latter  case  are  of  the  following  form: 


H  =  C{I--fA^)-^P  [I -<tA„)-^B 


(4.11) 
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where  it  is  understood  that,  in  expanding  such  a  formula  into  a  power  series,  cr‘ 
should  be  replaced  by  This  latter  unusual  feature  has  as  a  consequence  the  fact 
that  no  simple  “time  domain”  translation  of  the  “frequency  domain”  formula  (4.11) 
is  available.  However,  if  A^  is  nilpotent  so  that  (/  —  <tA„)~^  is  a  finite  series,  we  do 
obtain  the  following  explicit  representation  for  what  we  refer  to  as  the  finite  depth 
case  : 


XtQ 

=  -1-  D  (1,  cr, ...,  orW  j  Uta 

XtQ 

=  A-^Xt  A  D  (l,  (T, ...,  Ut0 

(4.12) 

yt 

=  Cxt 

where  D  (l,  <r, ...,  is  a  linear  combination  of  the  listed  operators. 

The  dynamics  (4.12)  represent  a  finite-extent  smoothing  along  each  horocycle  and 
a  generalized  coarse-to-fine  interpolation.  For  example,  as  discussed  in  Section  1,  the 
synthesis  form  of  the  Haar  transform  can  be  placed  exactly  in  this  form.  It  can  also 
be  shown  that  stationary  finite  depth  scalar  transfer  functions  may  be  equivalently 
expressed  in  the  following  ARM  A  form 

H  =  A-^D  (4.13) 

where  >1  is  a  causal  function  of  finite  support  a.nd  D  =  D  (l,  (T,  ...,  is  as  in  (4.12). 
This  ARMA  form  includes  as  a  special  case  the  AR  modeling  filters  for  “isotropic” 
processes  introduced  in  Section  3. 

The  preceding  development,  as  well  as  the  interpretation  of  the  synthesis  form  of 
the  wavelet  transform  provides  ample  motivation  for  the  studies  in  [16,  17,  18,  19, 
20,  48,  52]  of  properties  and  estimation  algorithms  for  multiscale  state  models  of  the 
form: 

x{t)  =  A{t)x{t^)  A  B{t)w{t)  (4.14) 

y{t)  =  C{t)x(t)  +  v{t)  (4.15) 

where  w{t)  and  i>(<)  are  independent  vector  white  noise  processes  with  covariances 
7  and  R{t),  respectively.  The  model  class  described  in  (4. 14), (4. 15)  represents  a 
noise-driven  generalization  of  the  zero-depth,  causal,  stationary  model  (4.12).  Specif¬ 
ically  we  obtain  such  a  stationary  model  if  all  of  the  parameters,  A,B,C,  and  R  are 
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constant.  There  are,  however,  important  reasons  to  consider  the  more  general  case 
(and,  in  addition,  its  consideration  does  not  complicate  our  analysis).  First  of  all, 
one  important  intermediate  case  is  that  in  which  the  system  parameters  are  constant 
at  each  scale  but  may  vary  from  scale  to  scale.  If  we  let  m{t)  denote  the  scale,  i.e. 
the  horocycle,  on  which  the  node  t  lies,  we  abuse  notation  in  this  case  by  writing 
i4(t)  =  >l(m(t)),  etc.  Such  a  model  is  useful  for  capturing  the  fact  that  data  may  be 
available  at  only  particular  scales  (i.e.  C{m.)  ^  0  only  for  particular  values  of  m); 
for  example  in  the  original  context  of  wavelet  analysis,  we  actually  have  only  one 
measurement  set,  corresponding  to  C{m)  being  nonzero  only  at  the  finest  scale  in  our 
representation.  "  Also,  by  varying  A{m),  B(m),  and  R(m)  with  m  we  can  capture 
a  variety  of  scale-dependent  effects.  For  example,  dominant  scales  might  correspond 
to  scales  with  larger  values  of  B(m).  Also,  by  building  a  geometric  decay  in  scale 
into  B(m)  it  is  possible  to  capture  1/f-like,  fractal  behavior  as  shown  and  studied 
in  [16,  47,  50].  Finally,  the  general  case  of  t-varying  parameters  has  a  number  of 
potential  uses.  For  example  such  form  for  C(t)  is  clearly  required  to  capture  the 
situation  depicted  in  Figure  3  in  which  fine  scale  measurements  are  not  available  at 
all  locations.  Also,  it  is  our  belief  that  such  models  will  prove  useful  in  modeling 
transient  events  localized  in  scale  and  time  or  space  and  to  capture  changing  signal 
or  image  characteristics. 

As  with  standard  temporal  state  models,  the  second-order  statistics  of  i(t)  are 
easily  computed.  In  particular  the  covariance  P*(f)  =  E[i(<)x^(t)]  evolves  according 
to  a  Lyapunov  equation  on  the  tree: 

P,(t)  =  A(t)F,(tj)A^(t)  +  B(t)B^(t)  (4.16) 

Specializing  to  the  case  in  which  A{t)  =  A{m{t))  and  B{t)  =  B{m{t)),  we  can  obtain 
a  covariance  that  allows  dependence  only  on  scale,  i.e.  /’*(<)  =  /^*(m(<)),  and  indeed 
in  this  case  we  have  a  standard  Lyapunov  equation  in  scale  : 

P*(m  -f  1)  =  A(m)Fi(m)A^(m)  -f  B{m)B^{m)  (4.17) 

is  important  to  emphasize  here  that  the  wavelet  transform  of  this  fine  scale  measurement — 
which  we  use  as  well  as  in  the  sequel-  does  not  correspond  to  measurements  as  in  (4.15)  at  several 
scales.  Rather  (4.15)  corresponds  to  independent  measurements  at  various  nodes. 
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Also,  as  shown  in  [16,  19]  the  full  covariance  function  in  this  case  is  given  by 

A',*(f,s)  =  $(Tn(t),m(3  A  t))P*(Tn(s  A  f ))$^(Tn(s),  m(s  A  t))  (4.18) 

where  $(m,n)  is  the  state  transition  matrix  associated  with  /l(Tn).  Specializing 
further  to  the  constant  coefficient  case  we  have  the  following  [16,  19];  if  A  is  stable 
and  if  P,  is  the  unique  solution  to  the  algebraic  Lyapunov  equation 

=  AP,A^  +  BB^  (4.19) 

then  our  state  model  generates  the  stationary  covariance 

A'x*(t,a)  =  (4.20) 

Note  that  in  the  scalar  case  our  constant  coefficient  model  is  exactly  the  AR(1)  model 
introduced  in  the  preceding  section  and  indeed  (4.19)-(4.20)  reduce  to 

(4.21) 

In  the  vector  case  (4.20)  is  stationary  but  not,  in  general,  isotropic.  However,  it  is 
interesting  to  note  that  we  do  obtain  an  isotropic  model  if  AP*  =  PgA^,  precisely 
the  condition  arising  in  the  study  of  teinporally-reversible  vector  stochastic  models 
[1].  Let  us  turn  now  to  the  problems  of  estimating  the  state  of  (4.14)  based  on  the 
measurements  (4.15).  Note  that  this  framework  allows  us  to  consider  not  only  the 
fusion  of  measurements  at  multiple  resolutions  but  also  the  reconstruction  of  processes 
at  multiple  scales.  Indeed  in  this  way  we  can  consider  the  resolution-accuracy  tradeoff 
directly  and  can  also  assess  the  impact  of  fine-scalt  fluctuations  on  the  accuracy  of 
coarser  scale  reconstructions,  a  problem  of  some  importance  in  applications  such  as 
the  fusion  of  satellite  IR  measurement  of  ocean  temperature  variations  with  point 
measurements  from  ships  in  order  to  produce  temperature  maps  at  an  intermediate 
scale.  To  be  specific  in  the  following  development  we  consider  the  problem  of  optimal 
estimation  on  a  finite  portion  of  T.  This  corresponds  to  estimation  of  a  temporal 
process  on  a  compact  interval  so  that  there  is  a  coarsest  scale  (and  hence  a  top  to 
our  subtree)  denoted  by  m  =  0,  and  a  finest  scale,  denoted  by  m  =  A/,  at  which 
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measurements  may  be  available  and/or  reconstructions  desired.  As  developed  in 
[16,  17,  18,  52],  the  model  structure  (4.14),  (4.15)  leads  to  three  efficient,  highly 
parallelizable  algorithmic  structures  for  optimal  multiscale  estimation.  A  first  of  these 
is  an  iterative  algorithm  taking  advantage  of  the  fact  that  (4.14)  defines  a  Markov 
random  field  structure  on  T.  Specifically,  let  Y  denote  the  full  set  of  measurements 
at  all  scales.  Then,  thanks  to  Markovianity  we  have  that 

£;[a:(0!r]  =  E{E[x{t)\x{t-i,x(ta),x{tl3),Y]\Y} 

=  £;{f;[i(t)|a:(t7),i(ta),x(t/?),j/(t)]|l'}  (4.22) 

where  the  second  equality  in  (4.22)  states  that  given  x(ty),  x(ta),  x{t/3),  only  the 
measurement  at  node  t  provides  additional  useful  information  about  x{t).  From 
(4.22)  we  can  then  obtain  an  explicit  representation  for  the  optimal  estimate  of  x(f) 
in  terms  of  the  optimal  estimates  at  its  parent  node,  ty,  at  its  immediate  descendant 
nodes,  ta  and  t/3,  and  the  single  measurement  at  node  t.  This  implicit  specification 
is  then  perfectly  set  up  for  solution  via  Gauss-Seidel  or  Jacobi  iteration  which  can 
be  organized  to  have  exactly  the  same  structure  as  multigrid  relaxation  algorithms, 
with  coarse- to-fine  and  fine- to- coarse  sweeps  that  in  multigrid  terminology  [11,  12, 
26,  29,  37,  39]  lead  to  so-called  V-  and  IT-cycle  iterations.  Furthermore,  in  such 
iterations  all  of  the  calculations  at  any  particular  scale  can  be  carried  out  in  parallel. 
In  addition  this  methodology  carries  over  completely  not  only  to  the  case  of  nonzero 
depth  models  as  in  (4.12),  with  the  additional  inter-node  connectivity  implied  by  the 
coupling  introduced  by  the  horocycle-smoothing  operator  D,  but  also  to  state  models 
on  more  general  lattices  corresponding  to  the  interpretation  of  (1.11)  as  defining  a 
scale-to-scale  dynamic  relationship  for  any  finitely-supported  QMF  pair  h{n),  p(n) 
and  thus  for  any  compactly-supported  wavelet  transform.  We  refer  the  reader  to 
[16,  19]  for  details  and  further  development  of  this  multigrid  estimation  methodology. 

A  second  estimation  structure  applies  to  the  case  in  which  all  system  parameters 
depend  only  on  scale  (i.e.  A{t)  =  A(m(t)),  etc.).  In  this  case,  as  shown  in  [16, 
17,  19],  the  Haar  transform,  applied  to  each  scale  of  the  state  process  x(<)  and  the 
measurement  data  y{t)  yields  a  decoupled  set  of  estimation  problems  for  each  of  the 
scale  components.  Specifically,  let  i(m)  denote  the  vector  of  all  2'"  values  of  x(t) 
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at  the  mth  scale,  and  let  j/(m),  uj(m),  and  v{m)  similarly.  Then  in  this  case  (4.14), 
(4.15)  can  be  rewritten  in  scale-to-scale  form; 


where 


x(m  +  1) 


A„+ix(m)  +  Bm+iw(m  +  1) 


j/(m)  =  C„j:(m)  +  t)(m) 


•^tn  +  l 


A(m  +  1)  0  0 

i4(m  +  1)  0  0 

0  >l(Tn  +  1)  0 

0  >1(777  +  1)  0 


0 

0 

0 

0 


(4.23) 

(4.24) 


(4.25) 


0  0 

0  0 

Bm+l  =  ...,  Bm+l) 

C„  =  dtag(C(m),...,C(m)) 


0  •  •  •  A{m  +  1) 

0  •  •  •  A{m  +  1) 

(4.26) 

(4.27) 


Note  that  1(777)  has  half  as  many  elements  as  1(777  +  i ),  reflecting  the  fine-to-coarse 
decimation  that  occurs  in  multiscale  representations.  As  shown  in  [16,  19],  the  covari¬ 
ances  of  1(777)  and  y(777)  as  well  as  the  cross-covariance  between  x  at  different  scales 
have  (block-)  eigenstructures  specified  by  the  Haar  transform.  For  example  if  x(t)  is 
a  scalar  process  and  we  look  at  a:(3),  which  is  8-dimensional,  we  find  that  the  covari¬ 
ance  of  this  vector  has  as  its  eigenvectors  the  columns  of  the  following  orthonormal 
matrix,  corresponding  to  the  (8-dimensional)  discrete  Haar  basis  consisting  of  vectors 
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representing  “dilated,  translated,  and  scaled”  versions  of  the  vector  [1,  —  1]^ 
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(4.28) 


Analogous  bases  can  be  defined  for  any  dimension  that  is  a  power  of  two,  and 
when  x(t)  is  a  vector  each  of  the  elements  of  matrices  as  in  (4.28)  is  replaced  by  a 
correspondingly-scaled  version  of  the  identity  matrix  of  dimension  equal  to  that  of 
x(t)  (e.g.  the  (1, 1)  block  of  such  a  matrix  would  be  (l/\/2)/). 

As  a  consequence  of  these  observations,  one  would  expect  considerable  simplifica¬ 
tion  if  we  consider  the  Haar-transformed  version  of  our  estimation  problem.  Specifi¬ 
cally,  define  the  transformed  variables 


5(m)  =  Vj'(m)x(m),  z{m)  =  Vj'{m)y{7n) 


(4.29) 


where  K(m)  (V'^y(Tn))  is  the  block-Haar  transform  matrix  of  block-size  equal  to  the 
dimension  of  x{t)  {y{t)).  In  this  transformed  representation  the  system  and  mea¬ 
surement  equations  block-decouple  completely.  Specifically,  the  vector  s(m)  can  be 
decomposed  into  2”*  subvectors  each  of  the  same  dimension  as  x{t),  and  we  index 
these  as  soo{tn),  S()i(m),  and  s,j(m)  for  1  <  i  <  m  —  1,  1  <  j  <  2’.  Here  3oo(Tn) 
is  the  component  corresponding  to  the  right-most  (block)  basis  component  in  l*(m) 
(refer  to  (4.28))-  i.e.  it  is  the  average  of  the  x{t)  at  the  mth  horocycle  (scaled  by 
2“"'^*);  Soiim)  is  then  the  coarsest  resolution  first  difference  coefficient  (see  the  next- 
to-last  column  in  (4.28)),  while  for  i  >  1,  the  Sij  correspond  to  the  fth  resolution 
first  difference  coefficients  (note  in  (4.28)  that  there  are  four  such  coefficients  at  the 
finest  resolution  and  two  at  the  next,  coarser  scale).  In  a  similar  fashion  we  define  the 
components  of  z(m).  With  these  definitions  we  find  that  we  have  a  set  of  completely 
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decoupled  standard  dynamic  systems  in  the  time-like  variable  m: 

Sij{m  +  1)  =  A{m  +  l)sij{m}  +  B{m  +  +  1),  0  <  t  <  m  —  1  (4.30) 

+  1)  =  B{m  +  l)T£j„j(m  +  1)  (4.31) 

Zij{m)  =  C(Tn)ai_,(7n)  +  v.j(m)  (4.32) 

Here  Wij{m)  and  Vij(m)  are  white  in  all  three  indices,  with  covariances  I  and  iZ(m), 
respectively. 

Recall  that  the  dimension  of  1(777)  increases  with  m,  indicative  of  the  increasing 
detail  available  at  finer  scales.  In  the  transformed  basis  this  is  made  absolutely 
explicit  in  that  we  see  that  the  dynamics  (4.30),  (4.31)  consists  of  two  parts:  the 
interpolation  of  coarse  features  to  finer  scales  (4.30)  and  the  initiation,  at  each  scale, 
of  new  components  (4.31)  representing  levels  of  detail  that  can  be  captured  at  this  (but 
not  at  any  coarser)  scale.  Thus  for  any  pair  of  indices  t,  j  we  have  a  dynamic  system 
in  771,  initiated  at  scale  m  =  i,  and  thus  we  can  use  standard  state  space  smoothing 
techniques  independently  for  each  such  system,  leading  to  a  highly  parallel  algorithm 
in  which  (a)  we  transform  the  available  measurement  data  y(m)  to  obtain  2(771)  as  in 
(4.29);  (b)  we  then  use  standard  smoothing  techniques  on  the  individual  components; 
and  (c)  we  inverse  transform  the  resulting  estimates  of  5(771)  to  obtain  the  optimal 
estimates  of  x(t)  at  all  nodes.  Note  that  the  fact  that  each  is  initiated  only  at 
the  I’th  scale  implies  that  the  corresponding  smoother  works  on  data  only  from  this 
and  finer  scales,  leading  to  a  set  of  smoothing  algorithms  of  different  (scale)  length. 
This  is  consistent  with  the  intuition  that  data  at  any  particular  scale  provides  useful 
information  at  that  scale  and  at  coarser  scales  (by  averaging)  but  not  at  finer  scales. 

We  refer  the  reader  to  [16, 17]  for  details  of  this  procedure  and  for  its  generalization 
to  the  case  of  nonzero-depth  models  and  to  arbitrary  lattices  associated  with  other 
wavelet  transforins-i.e.  to  dynamic  system  as  in  (1.11)  (and  a  significant  extension  of 
these)  with  other  choices  for  the  QMF’s  h(n)  and  p(n)  than  the  Haar  pair.  Again  one 
finds  that  the  wavelet  transformed  -  modified  appropriately  to  deal  with  the  window¬ 
ing  effect  of  smoothing  multiscale  measurements  over  a  compact  interval  -  yields  a  set 
of  decoupled  smoothing  problems  in  scale.  Since  the  wavelet  transform  can  be  com¬ 
puted  quite  quickly,  this  leads  to  an  extremely  efficient  overall  procedure.  We  note 
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also  that  by  specializing  our  model  to  the  case  in  which  process  noise  variances  de¬ 
crease  exponentially  in  scale  we  obtain  a  generalization  of  the  procedure  developed  in 
[51]  for  the  estimation  of  1/f-like  processes.  In  particular,  what  we  have  just  described 
provides  a  procedure  for  fusing  multiresolution  measurements  of  such  processes.  Fi 
nally,  we  note  that  the  interpretation  of  our  models  as  scale-to-scale  Markov  processes 
and  the  dual  viewpoint  that  the  wavelet  transform  for  such  a  model  whitens  the  data 
in  scale  suggest  the  problems  of  (a)  optimizing  wavelet  transforms  in  order  to  achieve 
maximal  scale-to-scale  decorrelation;  and  (b)  approximating  stochastic  processes  by 
such  scale-to-scale  Markov  models.  The  former  of  these  problems  is  discussed  in  [27] 
and  the  latter  is  touched  upon  in  [16,  17,  27].  In  particular  in  [17,  27]  we  construct 
approximate  models  of  this  type  for  a  standard  first-order  Gauss-Markov  process  (i.e. 
with  temporal  correlation  function  of  the  form  and  demonstrate  their  fi¬ 

delity  in  several  ways  including  their  use  as  the  basis  for  the  fusion  and  smoothing 
of  multiresolution  measurements  of  Gauss-Markov  processes.  In  Figure  7  we  depict 
the  correlation  function  of  such  a  unit- variance  first-order  Gauss-Markov  process  -  i.e. 
viewing  a  set  of  2"*  samples  of  this  process  as  the  values  of  x{m),  Figure  7  displays  the 
matrix  of  correlation  coefficients  of  the  elements  of  this  vector.  In  contrast  in  Figure  8 
we  display  the  correlation  coefficients  of  the  elements  of  s(7n)  obtained  as  in  (4.29), 
but  using  an  8-tap  QMF  h{n)  rather  than  the  2-tap  h{n)  -  i.e.  first  the  corresponding 
orthogonal  matrix  for  this  h{n)  is  applied  to  i(Tn),  and  then  the  resulting  covariance 
of  s(m)  is  modified  by  dividing  its  (i,  j)  element  by  the  square-root  of  the  product 
of  the  (i,i)  and  (j,j  )  elements,  yielding  the  matrix  of  correlation  coefficients.  As  one 
would  expect  from  the  work  on  transforming  kernels  of  integral  operators  in  [10],  the 
result  is  an  almost-diagonal  matrix,  implying  nearly  perfect  scale-to-scale  whitening. 
This  is  further  substantiated  in  [16,  17]  (see  also  Figure  3)  by  demonstration  of  the 
high  quality  estimates  produced  if  such  remaining  inter-scale  correlation  is  neglected. 

While  the  preceding  algorithm  provides  a  very  efficient  procedure  for  multiscale 
fusion,  its  use  does  require  that  all  model  parameters  vary  only  with  scale  and  thus  are 
constant  on  each  horocycle.  For  example  this  implies  that  if  any  measurement  is  avail¬ 
able  at  any  particular  scale,  than  a  full  set  of  measurements  is  available  at  that  scale. 
In  contrast,  the  result  shown  in  Figure  3  (a),(b)  corresponds  to  a  situation  in  which  we 
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have  only  sparse,  fine  scale  measurements  from  a  1/f-like  model  of  the  type  described 
in  [50,  51],  together  with  full- coverage,  but  coarser-resolution  measurements,  while 
Figure  3  (c)  and  (d)  correspond  to  the  analogous  situation  for  a  first-order  Gauss- 
Markov  process.  In  particular  in  each  case  16  fine  scale  measurements  are  taken  at 
each  end  of  the  64-point  signal,  together  with  coarse  measurements  of  4-point  averages 
of  this  signal.  While  the  wavelet-transform-based  smoothing  algorithm  does  not  apply 
to  this  case,  the  multigrid  method  described  previously  does  (using  in  the  case  of  (c) 
and  (d)  an  approximate  model  of  the  form  of  (4.14),  (4.15)  for  the  Gauss-Markov  pro¬ 
cess),  as  does  the  following  approach  which  not  only  provides  an  extremely  efficient 
algorithm  for  multiscale  fusion  but  also  illuminates  several  system-theoretic  issues  on 
dyadic  trees.  Specifically,  as  developed  in  detail  in  [16,  18,  19],  there  is  a  nontrivial 
generalization  of  the  so-called  Rauch-Tung-Striebel  (RTS)  smoothing  algorithm  for 
causal  state  models  [42].  Recall  that  the  standard  RTS  algorithm  involves  a  forward 
Kalman  filtering  sweep  followed  by  a  backward  sweep  to  compute  the  smoothed  esti¬ 
mates.  The  generalization  to  our  models  on  trees  has  the  same  structure,  with  several 
important  differences.  First  for  the  standard  RTS  algorithm  the  procedure  is  com¬ 
pletely  symmetric  with  respect  to  time  -  i.e.  we  can  start  with  a  reverse-time  Kalman 
filtering  sweep  followed  by  a  forward  smoothing  sweep.  For  processes  on  trees,  the 
Kalman  filtering  sweep  must  proceed  from  fine- to- coarse  followed  by  a  coarse-to-fine 
smoothing  sweep®. 

Furthermore  the  Ka’man  filtering  sweep,  is  somewhat  more  complex  for  processes 
on  trees.  In  particular  one  full  step  of  the  Kalman  filter  recursion  involves  a  mea¬ 
surement  update,  two  parallel  backward  predictions  (corresponding  to  backward  pre¬ 
diction  along  both  of  the  paths  descending  from  a  node),  and  the  fusion  of  these 
predicted  estimates.  Specifically,  as  depicted  in  Figure  9,  the  fine-to-coarse  Kalman 
filter  step  has  as  its  goal  the  recursive  computation  of  i(tit),  the  best  estimate  of 
x{t)  based  on  data  in  the  descendant  subtree  with  root  node  t.  As  in  usual  Kalman 
filtering  if  i(t|t-t-)  denotes  the  best  estimate  based  on  all  of  the  same  data  except  the 

*The  rea.«on  for  this  is  not  very  complex.  To  allow  the  measurement  on  the  tree  at  one  point  to 
contribute  to  the  estimate  at  another  point  on  the  same  level  of  the  tree,  one  must  use  a  recursion 
that  first  moves  up  and  then  down  the  tree. 
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measurement  at  node  t,  we  obtain  a  straightforward  update  step  to  produce 


x(t|t)  =  x(t|<  +  )  +  A'(t)[j/(t)  -  C'(t)i(<|f  +  )]  (4.33) 

A'(<)  =  P{t\t+)C^{t)V-\t)  (4.34) 

K(0  =  C(<)P(ttt+)C^(t)  + A(0  (4.35) 

and 

P((|()=  |/- A-(()0(()|P((|(+)  (4.36) 


Here  P[t\t)  and  P(t|t4-)  are  the  error  covariances  associated  with  x(t|t)  and  x(t|t+), 
respectively.  Working  back  one-step,  we  see  that  i(t  jt  +  )  represents  the  fusion  of  infor¬ 
mation  in  the  subtree  under  ta  and  under  t(3.  Thus  we  might  expect  that  x{t\t  +  )  could 
be  computed  from  the  one-step-backward-predicted  estimates  x(<|to)  and  x{t\t(I)  of 
i(t)  based  separately  on  the  information  in  the  subtrees  with  root  ta  and  root 
respectively.  Indeed  as  shown  in  [16,  19] 

i(iK)  =  P(i|(+)(P"'((|(Q)i(ilia)  +  P-'{(|(/3)x((|t)3)]  (4.37) 

P((|l+)  =  |P-‘((|(a)  +  P-'(im  -  P-'(<))-‘  (4.38) 

Finally  to  complete  the  recursion,  x(<|<a)  and  x{t\t(3)  are  computed  from  i(ta|<Q) 
and  x(</3|t/?),  respectively,  in  identical  fashions.  Specifically,  each  of  these  calculations 
represents  a  one-step-backward  prediction.  It  is  not  surprising,  then  that  a  backward 
version  of  the  model  (4.14)  plays  a  role  here.  Indeed,  as  shown  in  [16] 


x(<|ta)  =  F(tQ)x(<Q|ta)  (4.39) 

P{t\ta)  =  F{ta)P{ta\ta)F^{ta)  +  Q{ta)  (4.40) 

where 

F(()  =  /(-■(<)(/-  P(0P''(()P,-'(()l  (4.41) 

S(()  =  A-'{t)B(t)Q{t)B^{t)A-'^{t)  (4.42) 

W)  =  I  -  B^(i)P;\t)B(t)  (4.43) 


The  prediction  (4.39-4.43)  and  update  (4.33-4.36)  steps  correspond  to  the  analogous 
steps  in  the  usual  Kalman  filter  (although  here  we  must  use  the  backward  model  in 
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the  prediction  step),  while  the  fusion  step  (4.37)-(4.38)  has  no  counterpart  in  usual 
Kalman  filtering.  The  interpretation  of  (4.37)-{4.38)  is  that  we  are  fusing  together 
two  estimates  each  of  which  incorporates  one  set  of  information  that  is  independent 
of  that  used  in  the  other — i.e.  the  measurements  in  the  ta  and  t/?  subtrees-  and  one 
common  information  source,  namely  the  prior  statistics  of  x(t).  Eq.  (4.38)  ensures 
that  this  common  information  is  accounted  for  only  once  in  the  fused  estimate.  Once 
the  top  of  the  overall  tree  is  reached  we,  of  course,  have  the  optimal  smoothed  estimate 
at  that  node.  As  shown  in  [16,  18,  19],  it  is  men  possible  to  compute  the  optimal 
smoothed  estimated  in  a  recursive  fashion  moving  down  the  tree,  from  coarse  to  fine. 
This  recursion  combines  the  smoothed  estimate  x,(ty)  with  the  filtered  estimates 
from  the  upward  sweep  to  produce  x,{t): 

i,(()  =  i((|«)  +  -  ilPrlOl  (4.44) 

Note  that  this  algorithm  also  has  a  highly  parallel,  and  in  this  case  pyramidal,  struc¬ 
ture,  since  all  calculations,  on  either  the  fine-to-coarse  or  coarse- to- fine  sweep  can  be 
computed  in  parallel. 

Equations  (4.34-4.36),  (4.38),  and  (4.40-4.43)  define,  in  essence  a  Riccati  equation 
on  the  dyadic  tree.  As  for  standard  Riccati  equations,  it  is  possible  to  relate  properties 
of  the  solution  of  this  equation  to  system-theoretic  properties.  For  example,  one  can 
show  that  suitably  defined  notions  of  uniform  complete  reachability  and  uniform 
complete  observability  imply  upper  and  lower  positive-definite  bounds  on  the  error 
covariance.  Here  since  the  Riccati  equation  propagates  up  the  tree,  the  analysis 
of  reachability  and  observability  relate  to  systems  defined  recursively  from  fine-to- 
coarse  scale — i.e.  noncausal  systems  as  in  the  first  two  equations  of  (4.10).  One 
might  also  expect  that  one  could  obtain  results  on  the  stability  of  the  error  dynamics 
and  asymptotic  behavior  in  the  constant  parameter  case.  This  is  indeed  the  case,  but 
there  are  several  issues  that  complicate  the  analysis.  Specifically,  in  standard  Kalman 
filtering  analysis  the  Riccati  equation  for  the  error  covariance  can  be  viewed  simply  as 
the  covariance  of  the  error  equation,  which  can  be  analyzed  directly  without  explicitly 
examining  the  state  dynamics,  since  the  error  evolves  as  a  state  process  itself.  This  is 
not  the  case  here  in  general.  First,  while  the  process  x{t)  is  defined  recursively  moving 
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down  the  tree,  the  filtered  estimate  i(t|<)  is  defined  by  a  recursion  in  the  opposite 
direction.  This  difficulty  cannot  be  overcome  in  general  simply  by  reversing  one  of 
these  processes,  as  the  reversal  process  does  not,  in  general,  produce  a  system  driven 
by  white  noise.®  Also,  unlike  the  standard  situation,  our  Riccati  equation  explicitly 
involves  the  prior  state  covariance  Px{t),  arising  as  we’ve  seen  to  prevent  the  double 
counting  of  prior  information. 

There  is,  however,  a  way  in  which  these  difficulties  can  be  avoided,  essentially  by 
setting  P~^  to  zero.  In  particular,  as  discussed  in  [16,  18]  if  we  do  this  in  (4.33)- 
(4.43),  the  estimates  produced  have  the  interpretation  as  maximum  likelihood  (ML) 
estimates.  A  variation  of  the  RTS  algorithm  we  have  described  here  uses  this  ML 
procedure  to  propagate  to  the  top  of  the  tree,  at  which  point  prior  information  is  then 
incorporated,  followed  by  the  coarse-to-fine  sweep  (4.44).  To  see  what  happens  to  the 
Riccati  equation  and  error  dynamics  in  this  case,  let  us  focus  on  the  scale-varying 
case,  i.e.  the  case  in  which  all  parameters  depend  only  on  m{i).  In  this  case  the  same 
is  true  of  the  error  covariances,  yielding  the  following  Riccati  equation  in  scale: 

+  1)  =  A"^(Tn -f  l)FAfi;(»Ti  +  Ijm -f  l)A“^(m -f  1) 

+  G(m -f  l)Q(Tn -I- l)G^(Tn -f  1) 

PjJ}i(m|Tn)  =  2Pj^\{m\Tn  +  1)  +  C^{m)R~^{m)C{m) 

where 

G{m)  =  —A~^{m)B{m)  (4.47) 

This  Riccati  equation  differs  from  the  usual  equation  only  in  the  presence  of  the  factor 
of  2  in  (4.46),  representing  the  doubling  of  information  arising  in  the  fusion  step.  In 
this  case  w?  can  also  write  a  direct  fine-to-coarse  state  form  for  the  ML  estimation 
error  XML{t\t)  =  x{t)  -  XML{t\t)- 

=  ^(/  -  A'wl(”^(0)C(”’(0))^~'(”^(0  +  l){xML{Qt\at)  -i-  XML{0t\(3t)) 

*In  particular  the  backward  models  used  in  [16,  18,  19]  to  write  x(t)  in  terms  of  x{ia)  and  in 
terms  of  x{t0)  yield  driving  noises  which  are  martingale  differences  with  respect  to  the  partial  order 
defined  on  the  tree. 


(4.45) 

(4.46) 
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-  ~{I  -  KML{Mi))C{^{^)))G{m{t)  +  l)(iu(aO  +  w{0t))  -  KML{m{t))v{t) 

(4.48) 

=  /’(m|m)C^(m)/Z~^(m)  (4.49) 

In  [16,  18]  we  provide  a  detailed  analysis  of  (4.45)-(4.49).  In  particular  the  sta¬ 
bility  of  the  error  dynamics  (4.48)  under  reachability  and  observability  conditions  is 
established.  The  notion  of  stability,  however,  deserves  further  comment.  Intuitively 
what  we  would  like  stability  to  mean  is  that  the  state  of  the  recursion  up  the  tree 
decays  to  0  as  we  propagate  farther  and  farther  away  from  the  initial  level  of  the 
tree.  Note,  however,  that  as  we  move  up  the  tree  the  state  at  any  node  is  influenced 
by  a  geometrically  increasing  number  of  nodes  at  the  initial  level.  Thus  in  order  to 
study  asymptotic  stability  it  is  necessary  to  consider  an  infinite  dyadic  tree,  with  an 
infinite  set  of  initial  conditions  corresponding  to  all  nodes  at  the  initial  level.  The 
implications  of  this  are  most  easily  seen  in  the  constant  parameter  case.  In  this  case 
we  have  that  if  [A,  B)  is  a  reachable  pair  and  (C,  A)  observable,  then 

To.  =  +  IgqG^ 

-  K^(,]-CA-'7„A-'^C'^  +  ^CGQG^C'^  +  R)Kl,  (4.50) 

z  z 

where 

K^=7„C^R-^  (4.51) 

Moreover,  the  autonomous  dynamics  of  the  steady-state  ML  filter,  i.e. 

e{t)  =  i(7  -  K„C)A-^{e{at)  +  e{/3t))  (4.52) 

is  exponentially  I2  stable,  i.e.  the  I2  norm  of  all  values  of  e(t)  along  an  entire  horocycle 
converges  exponentially  to  zero  as  m{t)  —*  0.  As  shown  in  [16,  18]  this  is  equivalent 
to  all  eigenvalues  of  the  Kalman  filter  error  dynamics  matrix 

i(/ - /i«,C)A-^  (4.53) 

having  magnitude  less  than 
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5  Conclusions 

In  this  paper  we  have  outlined  a  mathematical  framework  for  the  multiresolution 
modeling  and  analysis  of  stochastic  processes.  As  we  have  discussed,  the  theory  of 
multiscale  signal  analysis  and  wavelet  transforms  leads  naturally  to  the  investigation 
of  multiscale  statistical  representations  and  dynamic  models  on  dyadic  trees  and 
lattices.  The  rich  structure  of  the  dyadic  tree  requires  that  we  take  some  care  in  the 
specification  of  such  models  and  in  the  generalization  of  standard  time  series  notions. 
In  particular,  we  have  seen  that  in  this  context  there  are  two  natural  concepts  of  shift 
invariance  which  provide  new  ways  in  which  to  capture  notions  of  scale-invariant 
statistical  descriptions.  In  addition,  the  observation  that  the  scale  variable  is  time¬ 
like  in  nature  leads  to  a  natural  notion  of  ’’causal”  dynamics  in  scale:  from  fine  to 
coarse;  however  the  tree  provides  only  a  partial  ordering  of  points,  requiring  that  we 
take  some  care  in  defining  the  “past”. 

In  part  of  our  work  we  have  described  the  multiscale  autoregressive  modeling  of 
isotropic  processes,  i.c.  processes  satisfying  our  stronger  notion  of  statistical  shift- 
invariance.  As  we  have  seen,  the  usual  AR  representation  of  time  series  is  not  a 
particularly  convenient  one  thanks  both  to  the  geometric  explosion  of  points  in  the 
“past”  as  we  increase  system  order  and  to  the  nonlinear  constraints  isotropy  imposes 
on  the  AR  coefficients.  In  contrast,  we  have  seen  that  it  is  possible  to  construct  a 
generalization  of  tiit  reflection-coefficient-based  lattice  representation  for  such  models, 
including  generalized  Levinson  and  Schur  recursions.  As  we  have  illustrated  such 
models  can  be  used  to  generate  fractal-like  signals. 

The  other  part  of  our  work  was  motivated  by  our  weaker  notion  of  stationarity 
which  in  essence  says  that  the  correlation  between  two  values  in  our  multiscale  rep¬ 
resentation  depends  on  the  difference  in  scale  and  location  of  the  two  points.  As  v/e 
have  seen,  this  framework  leads  to  state  models  evolving  from  coarse-to-fine  scales  on 
dyadic  trees.  We  have  described  some  of  our  work  on  a  basic  system  theory  for  such 
models  and  have  also  discussed  an  estimation  framework  that  allows  us  to  capture 
the  fusion  of  measurements  at  differing  resolutions.  In  addition  the  structure  of  these 
models  leads  to  several  extremely  efficient  and  highly  parallel  estimation  structures: 
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a  multiscale  iterative  algorithm  that  can  be  arranged  so  as  to  have  the  same  form 
as  well-known  multigrid  algorithms  for  solving  partial  differential  equations;  an  al¬ 
gorithm  using  wavelet  transforms  to  decouple  the  estimation  procedure  into  a  large 
set  of  far  simpler  parallel  estimation  algorithms;  and  a  pyramidal  algorithm  that 
introduCv:s  a  generalization  of  the  Kalman  filter  and  the  associated  Riccati  equation. 

As  we  have  discussed  and  illustrated,  these  models  appear  to  be  useful  for  a  rich 
variety  of  processes  including  the  1/f-like  models  as  introduced  in  [50,  51]  and  stan¬ 
dard  first-order  Gauss-Markov  processes.  Much,  of  course,  remains  to  be  done  in 
developing  this  theory,  in  investigating  the  processes  that  can  be  conveniently  and 
accurately  represented  within  this  framework,  and  in  applying  these  results  to  prob¬ 
lems  of  practical  importance  such  as  sensor  fusion,  noise  rejection,  multisensor  or 
multiframe  data  registration  and  mapping,  and  segmentation.  Among  the  theoret¬ 
ical  topics  under  investigation  are  the  development  of  model  fitting  and  likelihood 
function- based  methods  for  parameter  estimation  and  segmentation  and  the  develop¬ 
ment  of  a  detailed  theory  of  approximation  of  stochastic  processes  including  a  spec¬ 
ification  of  those  processes  that  can  be  “well”-approximated  by  models  of  the  type 
we  have  introduced.  Of  particular  interest  is  the  dynamic  interpretation  of  so-called 
wave  packet  transforms  [21]  in  which  the  wavelet  coefficients  are  subjected  to  further 
decomposition  through  the  same  filter  pair  used  in  the  wavelet  transform.  Viewing 
this  from  our  dynamic  synthesis  perspective,  this  would  appear  to  correspond  to  a 
class  of  higher-order  models.  Identifying  and  analyzing  this  model  class,  however, 
remains  for  the  future. 
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translational  shift 


Figure  1;  The  dyadic  tree,  in  which  each  level  of  the  tree  corresponds  to 
a  single  scale  in  a  multiscale  representation.  The  nodes  here  correspond  to 
scale/shift  pjurs  (m,n). 
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Figure  3:  Illustrating  nmltiscale  data  fusion  using  the  techniques  described 
in  Section  4.  In  (a)  and  (b)  a  signal  with  a  1/f-like  spectrum  (as  described 
in  [50]),  shown  as  a  solid  line  in  both  plots,  is  reconstructed  based  on  mea¬ 
surements.  In  (a)  data  is  available  only  at  the  two  ends  of  the  interval, 
while  in  (b)  t..oarse  scale  (i.e.  locally  averaged)  measurements  are  fused  to 
improve  signal  interpolation.  In  (c)  and  (d)  analogous  results  are  shown  for 
the  multiscale  data  fusion  and  interpolation  of  a  Gauss-Markov  process. 
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Figure  3:  (continued) 
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2  successive  horocycles: 


Figure  4:  A  more  symmetric  depiction  of  the  dyadic  tree,  illustrating  the 
notion  of  a  boundary  point  — oo,  horocycles,  and  the  “parent”  s  At  of  nodes 
5  and  t  (see  the  text  for  explanations). 
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Figure  5:  Illustrating  (in  bold)  the  skeleton  of  a  translation.  As  indicated  in 
the  figure,  any  translation  with  this  skeleton  must  map  the  subtree  extending 
away  from  any  node  on  the  skeleton  onto  the  corresponding  subtree  of  the 
next  node.  There  are,  however,  many  ways  in  which  this  can  be  done  (e.g. 
by  “pivoting”  isometries  within  any  of  these  subtrees). 
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Figure  6:  Illustrating  the  nature  of  the  construction  required  in  developing 
recursions  for  Et,n  and  Here  if  t  is  the  node  in  the  lower  left-hand  corner, 
then  the  elements  of  Et^4  are  the  prediction  errors  at  the  two  points  indicated 
by  diamonds  given  the  data  spanned  by  the  circles.  The  elements  of 
are  the  prediction  errors  at  the  four  points  indicated  by  squares  given 
again  the  data  in  The  elementary  “pivoting”  isometries  indicated  in 

the  figure  allovv  us  to  obtain  the  result  on  PARCOR  coefficients  described  in 
the  text. 
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Figure  7:  Illustrating  the  covariance  matrix  of  a  set  of  samples  of  a  first-order 
Gauss-Markov  process  with  covariance  of  the  form  exp~“^*L  Black  corre¬ 
sponds  to  a  value  of  1  with  lighter  shades  representing  smaller  values.  The 
covariance  of  this  process  decays  exponentially  as  we  move  away  from  the 
main  diagonal. 
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Figure  8;  The  matrix  of  correlation  coefficients  (i.e.  covariance  divided  by 
the  square  root  of  the  product  of  variances)  for  the  wavelet-transform  of  the 
Gauss-Markov  process  of  Figure  7  using  an  8-tap  QMF. 


AFIT/AFOSR  Wavelets  Workshop  421 


©19S2  IEEE.  Reprinted,  with  pernuMion,  from  IEEE  TVanseaUont  on  InformoUon  Theorf,  Vol.  36,  No.  2, 
pp  78S-800,  March  1692. 

PermiMion  to  copy  without  fee  all  or  part  of  this  material  is  granted  provided  that  the  copies  are  not  made 
or  distributed  for  direct  commerical  advantage,  the  IEEE  copyright  notice  and  the  title  of  the  publication 
and  iu  date  appear,  and  notice  u  given  that  copying  is  by  permission  of  the  Institute  of  Electrical  and 
Electronics  Engineers.  To  copy  otherwise,  or  to  republish,  requires  a  fee  and  specific  permission. 


Wavelet- Based  Representations  for  a  Class  of  Self-Similar 
Signals  with  Application  to  Fractal  Modulation 

Gregory  W.  Womell  and  Alan  V.  Oppenheim 


Abstract 

A  poieniially  imporlanl  family  of  self-similar  signals  is  introduwed  based  upon  a  deterministic 
scale-invariance  characterization.  These  signals,  which  we  refer  to  as  “dy-homogeneous”  signals  be¬ 
cause  they  generalize  the  well-known  homogeneous  functions,  have  highly  convenient  representations 
in  terms  of  orthonormal  wavelet  bases.  In  particular,  wavelet  representations  can  be  exploited  to 
construct  orthonormal  “self-similar"  bases  for  these  signals.  The  spectral  and  fractal  characteristics 
of  dy-homogeneous  signals  make  them  appealing  candidates  for  use  in  a  number  of  applications. 
As  one  potential  example,  we  consider  their  use  in  a  communications-based  context.  Specifically, 
we  develop  a  strategy  for  embedding  information  into  a  dy-homogeneous  waveform  on  multiple 
time-scales.  This  multirate  modulation  strategy,  which  we  term  “fractal  modulation,"  is  potentially 
well-suited  for  use  with  noisy  channels  of  simultaneously  unknown  duration  and  bandwidth.  Com¬ 
putationally  efficient  modulators  and  demodulators  are  suggested  for  the  scheme,  and  the  results  of 
a  preliminary  performance  evaluation  are  presented.  Although  not  yet  a  fully  developed  protocol, 
fractal  modulation  represents  a  potentially  viable  paradigm  for  communication. 

/ndez  Terms—  fractals,  wavelets,  modulation  theory,  spread  spectrum 


1  Introduction 

Signals  with  self-similar  properties,  t.c.,  signals  which  retain  many  of  their  essential  char¬ 
acteristics  under  time  scaling  arise  frequently  in  physical  processes  and  also  are  potentially 
important  in  signal  generation  for  communications,  remote  sensing,  and  many  other  ap¬ 
plications.  The  most  extensively  studied  class  of  such  signals  are  those  random  processes 
which  exhibit  slalistical  self-similarity,  e.g.,  processes  whose  autocorrelation  functions  re¬ 
main  invariant  to  within  an  amplitude  factor  under  arbitrary  scalings  of  the  time  axis.  An 

This  work  has  been  supported  in  pajt  by  the  Advajiced  Research  Projects  Agency  monitored  by  ONR 
under  Contract  No.  N00014-89-J-H89.  and  the  Air  Force  Office  of  Sdentiiic  Research  under  Grant  No. 
AFOSR-91-0034. 

The  authors  are  with  the  Research  Laboratory  of  Electronics,  Massachusetts  Institute  of  Technology, 
Cambridge,  MA  02139. 


1 


422  AFIT/AFOSR  Wavelets  Workshop 


important  family  of  such  random  processes  are  typically  referred  to  as  Iff  processes.  These 
processes  are  often  used  in  modeling  natural  landscapes,  the  distribution  of  earthquakes, 
ocean  waves,  turbulent  flow,  the  pattern  of  errors  on  communication  channels,  and  many 
other  natural  phenomena. 

In  this  paper  we  consider  signals  that  exhibit  deterministic  self-similarity,  whereby  the 
signal  itself  remains  invariant  to  within  an  amplitude  factor  under  arbitrary  scaling  of  the 
time  axis.  This  class  of  signals,  referred  to  as  homogeneous  signals  [1],  is  fairly  restricted. 
However,  by  generalizing  the  class  of  homogeneous  signals  to  require  self-similarity  only 
under  lime  scaling  by  integer  powers  of  two,  a  family  of  signals  results  with  potential  use  as 
waveforms  in  a  range  of  enginee’^ing  appUcations.  As  an  example  of  one  promising  direction 
for  applications,  we  consider  the  use  of  homogeneous  signal  sets  in  a  communications-based 
context.  Specifically,  we  develop  an  approach  for  embedding  information  into  homogeneous 
waveforms  which  we  term  “fractal  modulation.”  Because  the  resulting  waveforms  have  the 
property  that  the  information  is  contained  within  multiple  time  scales  and  frequency  bands, 
we  are  able  to  show  that  such  signals  are  well-suited  for  transnoission  over  noisy  channels 
of  simultaneously  unknown  duration  and  bandwidth.  This  a  reasonable  model  not  only  for 
many  physical  channels,  but  also  for  the  receiver  constraints  inherent  in  many  point-to- 
point  and  broadcast  communication  scenarios.  While  this  proposed  modulation  scheme  is 
very  preliminary  and  there  are  many  unresolved  issues  to  be  explored,  it  is  suggestive  of 
potential  ways  in  which  homogeneous  signals  can  perhaps  be  exploited. 

Our  approach  to  the  analysis  and  representation  of  homogeneous  signals  is  based  on 
the  use  of  orthonormal  wavelet  bases.  These  bases,  which  have  the  property  that  all  basis 
functions  are  dilations  and  translations  of  some  prototype  function,  are  in  many  respects 
ideally  suited  for  use  with  self-similar  signals  (2).  Furthermore,  because  wavelet  transfor¬ 
mations  can  be  implemented  in  a  computationally  efficient  manner,  the  wavelet  transform 
is  not  only  a  theoretically  important  tool,  but  a  practical  one  as  well. 

In  Section  2.  we  briefly  summarize  the  notation  and  properties  of  wavelet  bases  to  be 
used  in  the  remainder  of  the  paper.  Section  3  introduces  and  develops  the  generalized 
family  of  homogeneous  signals  defined  in  terms  of  a  dyadic  scale-invariance  property.  We 
distinguish  between  two  classes:  energy-dominated  and  power-dominated,  and  develop  their 


2 


AFJT/AFOSR  Wavelets  Workshop  423 


spectral  properties.  We  show  that  orthonormal  self-similar  bases  can  be  constructed  for  ho¬ 
mogeneous  signals  using  wavelets.  Using  these  representations,  we  then  derive  efficient 
discrete-time  algorithms  for  synthesizing  and  analyzing  homogeneous  signals.  Section  4  de¬ 
velops  the  concept  of  fractal  modulation.  In  particular,  we  use  the  orthonormal  self-similar 
basis  expansions  derived  in  Section  3  to  devdop  an  approach  for  modulating  information 
sequences  onto  homogeneous  signals.  After  developing  the  corresponding  optima!  recover, 
we  evaluate  the  performance  of  the  resulting  scheme  in  the  context  of  a  particular  channel 
model  and  make  comparisons  to  more  traditional  forms  of  modulation.  Finally,  Section  5 
summarizes  the  principal  contributions  of  the  paper  and  suggests  some  interesting  and 
potentially  important  directions  for  future  research. 

2  Wavelet  Notation 

In  this  section,  we  establish  the  notational  conventions  and  terminology  for  the  aspects  of 
wavelet  theory  we  shall  exploit  in  this  paper.  For  a  more  general  review  of  the  theory  of 
orthonormal  wavelet  bases,  see,  e.p.,  the  classic  references  [3]  [4]. 

An  orthonormal  w-avelet  transformation  of  a  signal  z(t)  is  described  in  terms  of  the 
synthesis/analysis  equations^ 


x{t) 


X 


m 

n 


m  n 


oo 


(la) 

(lb) 


and  has  the  special  property  that  the  orthogonal  basis  functions  are  all  dilations  and  trans¬ 
lations  of  a  single  function  referred  to  as  the  basic  wavelet  0(t).  In  particular. 


^”(0  =  2’"/’  0(2”‘t-n) 


(2) 


where  m  and  n  are  the  dilation  and  translation  indices,  respectively. 

The  Fourier  transform  of  the  basic  wavelet,  denoted  ’i'(u)),  often  has  a  bandpass  charac- 


’We  shall  assume  throughout  that  all  summations  over  m  ajid  n  extend  Irom  — oo  to  oo  unless  other  arise 
noted. 
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ter,  at  least  roughly.  As  a  consequeuce,  wavelet  decompositions  may  be  interpreted  rather 
naturally  in  terms  of  a  critically  sampled  generalized  constant-Q  or  octave-band  filter  bank. 
In  fact,  an  example  of  a  wavelet  basis,  and  one  which  will  play  an  important  role  in  this 
paper,  is  the  ideal  bandpass  wavelet  basis.  In  this  specific  case,  the  Fourier  transform  of 
the  wavelet,  which  we  denote  by  ’i'(w),  is 


1  IT  <  |w|  <  2ir 
0  otherwise 


(3) 


In  many  applications,  it  is  useful  to  impose  some  degree  of  regularity  on  the  wavelet 
basis.  As  is  well-known  [4],  a  sufficient  condition  for  a  wavelet  basis  to  possess  i£th-order 
regularity 

'jp(w)  ~  O  {\u)\-^)  ,  oo 

where  R  is  some  positive  integer,  is  that  the  wavelet  have  R  vanishing  moments,  t.e., 
r  f  dt  =  =  0,  r  =  0, 1 . R-1. 

oo 

Many  examples  of  wavelets  with  such  regularity  have  been  developed  in  the  literature;  see, 
e.g.,  [4]. 

A  broad  class  of  orthonormal  wavelet  bases  may  also  be  conveniently  interpreted  in  terms 
of  multiresolution  signal  analysis.  Associated  with  each  such  wavelet  basis  is  a  corresponding 
scaling  function  <l>(i)  having  a  Fourier  transform  5>(w)  that  is  at  least  roughly  lowpass.  The 
scaling  function  associated  with  the  ideal  bandpass  wavelet  basis,  in  fact,  has  an  ideal 
lo^-pass  Fourier  transform 

^  f  1 


$(u;)  = 


0  otherwise 


A  resolution-limited  approximation  Ami(t)  to  a  signal  x{t)  in  which  details  on  scales 
2”*  and  finer  are  discarded  is  obtained  via  the  orthonormal  expansion 
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where  the  are  also  all  dilations  and  translations  of  one  another,  viz., 

and  where  the  coefficients  c”  are  obtained  by  projection; 

a^=  r  (5) 

^—00 

For  these  signal  approximations,  the  detail  signal  capturing  the  information  in  x{t) 

between  scales  2"*  and  has  the  orthonormal  expansion 

D„,xit)  =  =  53*"  C(0- 

n 

The  multiresolution  signal  analysis  interpretation  of  wavelet  bases  also  leads  to  efficient 
discrete-time  algorithms  for  implementing  wavelet  transformations.  In  particular,  associ¬ 
ated  with  every  wavelet-based  multiresolution  analysis  is  a  quadrature  mirror  filter  (QMF) 
pair  whose  unit-sample  responses  /i[n]  and  ^[n)  have  at  least  roughly  lowpass  and  highpass 
discrete-time  Fourier  transforms  JT(u>)  and  G(u>),  respectively.  These  filters  are  exploited 
in  the  following  filter-downsample  analysis  algorithm 

<  =  (6») 
i 

l 

which  may  be  applied  recursively  to  extract  the  wavelet  coefficients  at  successively 
coarser  scales.  In  a  complementary  manner,  the  following  upsample-filter-merge  synthesis 
algorithm 

C"'  =  E  -  20  <^T  +  9{n  -  21]  xT)  (6c) 

l 

may  be  applied  recursively  to  reconstruct  the  coefficients  a”  of  an  increasingly  fine-scale  . 
approximation  to  a  signal  i(t).  Collectively  eqs.  (6)  constitute  what  has  become  known  as 
the  Discrete  Wavelet  Transform  (DW’T). 
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op 


3  Deterministically  Self-Similar  Signals 

Signals  x(t)  satisfying  the  detenninistic  scale-invariance  property 

x(i)  =  a"^x{at)  (7) 

for  all  a  >  0,  are  generally  referred  to  in  mathematics  as  homogeneous  functions  of  degree 
E.  As  shown  by  Gel’fand  [1],  homogeneous  functions  can  be  parameterized  with  only  a  few 
constants.  As  such,  they  constitute  a  rather  limited  class  of  signal  models  in  mzmy  contexts. 

A  comparatively  richer  class  of  signal  models  is  obtained  by  considering  waveforms  which 
are  required  to  satisfy  (7)  only  for  values  of  a  that  are  integer  powers  of  two,  t.e.,  signals 
that  satisfy  the  dyadic  self-similarity  property 

x(i)  =  2-*^x(2''0  (8) 

for  all  integers  h.  While  we  shall  use  the  generic  term  “homogeneous  signal"  to  refer  to 
signals  satisfying  (8),  when  there  is  risk  of  confusion  in  our  subsequent  development  we  will 
specifically  refer  to  signals  satisfying  (8)  as  dy-homogeneous. 

Homogeneous  signals  have  spectral  characteristics  very  much  like  those  of  1//  processes 
and,  in  fact,  have  fractal  properties  as  well.  Specifically,  although  all  non-triviaJ  homo¬ 
geneous  signals  have  infinite  energy  and  many  have  infinite  power,  there  are  nevertheless 
some  such  signals  with  which  one  can  associate  a  generalized  l//-like  Fourier  transform, 
and  others  with  which  one  can  associate  a  generalized  l//-like  power  spectrum.  We  dis¬ 
tinguish  between  these  two  classes  of  homogeneous  signals  in  our  subsequent  treatment, 
denoting  them  energy-dominated  and  power-dominated  homogeneous  signals,  respectively. 
As  we  develop  in  Sections  3.1  and  3.2,  orthonormal  wavelet  basis  expansions  constitute 
particularly  convenient  and  efficient  representations  for  these  two  classes  of  signals. 

3.1  Energy-Dominated  Homogeneous  SigneJs 

Definition  1  A  dy-homogeneous  signal  x{t)  is  said  to  be  energy-dominated  if  when  x{t)  is 
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filtered  by  an  ideal  bandpass  filter  with  frequency  response 


Boiu,)  =  I 

the  resulting  signal  xo(0  finite-energy,  i.e., 


1  JT  <  |ti;|  <  2jr 

0  otherwise 


f  io(0 

J^OO 


<  00. 


(9) 


The  choice  of  passband  edges  at  ir  and  2ir  in  out  definition  is,  in  fact,  somewhat  arbitrary. 
In  particular,  substituting  in  the  definition  any  passband  that  includes  one  entire  frequency 
octave  but  does  not  include  u  =  0  or  =  oo  leads  to  precisely  the  same  class  of  signals. 
However,  our  particular  choice  is  sufficient  and  is  made  in  anticipation  of  the  representation 
of  this  class  of  signals  in  terms  of  a  wavelet  basis. 

The  class  of  energy-dominated  homogeneous  signals  includes  both  reasonably  regular 
functions,  such  as  the  constant  i(t)  =  1,  the  ramp  x(f)  =  t,  the  time-warped  sinusoid 
x(t)  =  cos[2fflog2t],  and  the  unit  step  function  x(f)  =  u(f),  as  weD  as  singular  functions, 
such  as  x(t)  =  S(t)  and  its  derivatives.  We  denote  by  the  collection  of  all  energy- 
dominated  homogeneous  signals  of  degree  E.  The  following  theorem  allows  us  to  interpret 
the  notion  of  spectra  for  such  signals.  A  straightforward  but  detailed  proof  is  provided  in 
Appendix  A. 

Theorem  2  HTien  an  energy-dominated  homogeneous  signal  x{i)  of  degree  H  is  filtered  by 
an  ideal  bandpass  filter  with  frequency  response 


{1  UL  <  |u;|  <  LJu 

0  otherwise 


(10) 


for  arbitrary  Q  <  <  oo,  the  resulting  signal  y{t)  has  finite  energy  and  a  Fourier 

transform  of  the  form 


Yiu) 


iA’’(u;)  wx,  <  1^1  < 
0  otherwise 


(11) 


where  X(u)  is  some  function  that  is  independent  ofui  anduu  and  has  octave-spaced  ripple, 
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i.e.,  for  all  integers  k, 

j)  =  |2‘w;-+^A'(2^w).  (12) 

Since  in  this  theorem  X{ui)  does  not  depend  on  oi  u^,  this  function  may  be  m- 
terpreted  as  the  generalized  Fourier  transform  of  x{t).  Furthermore,  (12)  implies  that  the 
generalized  Fourier  transform  of  signals  in  obeys  a  l//-like  (power -law)  relationship, 
viz.. 

We  note  that  because  (11)  excludes  w  =  0  and  w  =  oo,  the  mapping 

x(i)  ^  X{u) 

is  not  one  to  one.  As  an  example,  x{t)  =  1  and  i(t)  =  2  are  both  in  for  =  0,  yet 
both  have  A'(uj)  =  0  for  w  >  0.  In  order  to  accommodate  this  behavior  in  our  subsequent 
theoretical  development,  all  signals  having  a  common  ^(0;)  will  be  combined  into  an  equiv¬ 
alence  class.  For  example,  two  homogeneous  functions  f{i)  and  g(t)  are  equivalent  if  they 
differ  by  a  homogeneous  function  whose  frequency  content  is  concentrated  at  the  origin,  for 
example  in  the  case  that  .S’  is  an  integer. 

Because  the  dyadic  self- similarity  property  (8)  of  dy-homogeneous  signal;  is  very  sim¬ 
ilar  tc  the  dvadic  scaling  relationship  between  basis  fvTictions  in  an  orthonormal  wavelet 
basis,  wavelets  provide  a  particiilarly  nice  representation  for  this  family  of  signals.  Spedf- 
ically,  with  i(t)  denoting  an  energy-dominated  homogeneous  signal,  the  expansion  in  an 
orthonormal  wavelet  basis 's 


i(t) 


X 


m 

n 


EE 

m  n 


r  z(t)c(o- 

v  — 00 


(13a) 

(13b) 


Since  x(i)  satisfies  (8)  and  since  satisfies  (2),  it  easily  follows  from  (13b)  that  for 

homogeneous  signals 


I 


m 

n 


0 

n 


(14) 
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where 

P  =  2’^+^  =  2’^.  (15) 

Denoting  i®  by  g[n],  (13a)  then  becomes 

(16) 

m  n 

from  which  we  see  that  x{t)  is  completely  specified  in  terms  of  ^[n].  We  term  g[n)  a 
generating  sequence  for  x{i)  since,  as  we  shall  see,  this  representation  leads  to  techniques 
for  synthesizing  useful  approximations  to  homogeneous  signals  in  practice. 

Let  us  now  specifically  choose  the  ideal  bandpass  wavelet  basis,  whose  basis  functions 
we  denote  by 

V’r(<)  =  2"*/2  ^(2"‘t-n) 

where  ^(t)  is  the  ideal  bandpass  wavelet  whose  Fourier  transform  is  given  by  (3).  K  we 
sample  the  output  xo{t)  of  the  filter  in  Definition  1  at  unit  rate,  we  obtain  the  sequence 
g[n]  =  i°,  where  denotes  the  coefficients  of  expansion  of  x{t)  in  terms  of  the  ideal 
bandpass  wavelet  basis.  Since  zo(0  Definition  1  has  the  orthonormal  expansion 

io(0  =  (17) 

n 

we  have 

f  =  (18) 

J-oo  „ 

Consequently,  a  homogeneous  function  is  energy-dominated  if  and  only  if  its  generating 
sequence  in  terms  of  the  ideal  bandpass  wavelet  basis  has  finite  energy,  t.e., 

<  00. 
n 

A  convenient  inner  product  between  two  energy-dominated  homogeneous  signals  f{t) 
and  g{i)  can  be  defined  as 

{f,9)^=  f  fo{t)9o{t)dt 

•/  — OO 
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where  the  signals  /o(0  and  po(0  responses  of  the  bandpass  filter  (9)  to  f{t)  and  p(t), 

respectively.  Exploiting  (17)  we  may  more  conveniently  express  this  inner  product  in  terms 
of  a[n]  and  h[n],  the  respective  generating  sequences  of  /(f)  and  g(t)  under  the  bandpass 
wavelet  basis,  as 

=  (19) 

n 

With  this  inner  product,  constitutes  a  Hilbert  space  and  the  induced  norm  on  E^  is 


lklU=  f  io(<)‘i«  = 


(20) 


One  can  readily  construct  “self-similar”  bases  for  E^ .  Indeed,  the  ideal  bandpass 
wavelet  (16)  immediately  provides  an  orthonormal  basis  for  E^.  In  particular,  for  any 
i(f)  €  E^ ,  we  have  the  synthesis/analysis  pair 


*(0  =  (21a) 

n 

,-[n)  =  (21b) 

where  one  can  easily  verify  that  the  basis  functions 

(22) 

m 

are  self-similar,  orthogonal,  and  have  unit  norm. 

The  fact  that  the  ideal  bandpass  basis  is  unrealizable  means  that  (21a)  is  not  a  practical 
mechanism  for  synthesizing  or  analyzing  homogeneous  signals.  However,  more  practical 
wavelet  bases  are  equally  suitable  for  defining  an  inner  product  for  the  Hilbert  space  E^ . 
In  fact,  we  shall  show  that  a  broad  class  of  wavelet  bases  can  be  used  to  construct  such  inner 
products,  and  that,  as  a  consequence,  some  highly  efficient  algorithms  arise  for  processing 
homogeneous  signals. 

Not  every  orthonormal  wavelet  basis  can  be  used  to  define  inner  products  for  E^ .  In 
order  to  determine  which  orthonormal  wavelet  bases  can  be  used  to  define  inner  products 
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for  we  must  determine  for  which  wavdets 

,ini=  r  *{()^;(i)di€i=(z) « 

•'-»  m  n 

That  is,  we  seek  conditions  on  a  wavelet  basis  such  that  the  sequence 

9[n]=  f  xii)tl>lit)dt 

J-^OO 

has  finite  energ}'  whenever  the  homogeneous  signal  x{i)  is  energy-dominated,  and  simulta¬ 
neously  such  that  the  homogeneous  signal 

*(')  =  EEr""9lnlC(i) 

tn  n 

is  energy- dominated  whenever  the  sequence  g[r*]  has  finite  energy.  Our  main  result  is  pre¬ 
sented  in  terms  of  the  following  theorem.  A  proof  of  this  theorem  is  provided  in  Appendix  B. 

Theorem  3  Consider  an  orthonormal  wavelet  basis  such  that  rh{t)  has  R  vanishing  mo~ 
ments  for  some  integer  R>  1,  i.e., 

'5’(")(0)  =  0,  r  =  0,l . R-1  (23) 

and  let 

^(<)=EE^‘”"!Wcw 

m  n 

be  a  dy-homogeneous  signal  whose  degree  H  is  such  that  7  =  logj  ^  =  2H  -f  1  satisfies 
0  <  7  <  2R  —  1.  Then  i(t)  is  energy- dominated  if  and  only  if  g[n]  has  finite  energy. 

This  theorem  implies  that  we  may  choose  for  our  Hilbert  space  from  among  a  large 
number  of  inner  products  whose  induced  norms  are  all  equivalent.  In  particular,  for  any 
wavelet  V’CO  suffidently  many  vanishing  moments,  we  may  define  the  inner  product 
between  two  functions  f{t)  and  g{t)  in  E^  whose  generating  sequences  are  a[Ti]  and  i»[n], 
respectively,  as 

(24) 

n 
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Of  course,  this  coUectiou  of  inner  products  is  almost  surely  not  exhaustive.  Even  for  wavelet- 
based  inner  products,  Theorem  3  asserts  only  that  the  vanishing  moment  condition  is  suih- 
dent  to  ensure  that  the  inner  product  generates  an  equivalent  norm.  It  seems  unlikely  that 
the  vanishing  moment  condition  is  a  necessary  condition. 

The  wavelet-based  norms  for  constitute  a  highly  convenient  and  practical  collection 
&om  which  to  choose  in  applications  involving  the  use  of  homogeneous  signals.  Indeed,  each 
assodated  wavelet-based  inner  product  leads  immediately  to  an  orthonormal  self-similar 
basis  for  E^:  if  x{t)  €  E^ ,  then 


^(0  =  (25a) 

fl 

9[n]  =  (25b) 

where,  again,  the  basis  functions 

m 

are  all  self-similar,  mutually  orthogonal,  and  have  unit  norm. 

Finally,  we  remark  that  wavelet-based  characterizations  also  give  rise  to  a  convenient 
expression  for  the  generalized  Fourier  transform  of  an  energy-dominated  homogeneous  sig¬ 
nal,  i(t).  In  particular,  if  we  take  the  Fourier  transform  of  (16)  we  get,  via  some  routine 
algebra, 

;ir(w)  =  ^  2-^^+^)’"4'(2-”’a;)(?(2-’"u;)  (27) 

m 

where  C?(uj)  is  the  discrete-time  Fourier  transform  of  g[n].  This  spectrum  is  to  be  interpreted 
in  the  sense  of  Theorem  2,  i.  c.,  X  (w)  defines  the  spectral  content  of  the  output  of  a  bandpass 
filter  at  every  frequency  u  within  the  passband. 

3.2  Power-Dominated  Homogeneous  Signals 

Energy-dominated  homogeneous  signals  have  infinite  energj’.  In  fact,  most  have  infinite 
power  as  well.  However,  there  are  other  infinite  power  homogeneous  signals  that  are  not 
energy-dominated.  In  this  section,  we  consider  a  more  general  class  of  infinite-power  homo- 
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geneouE  signals  referred  to  as  power-dominated  homogeneous  signals  which  will  find  applica¬ 
tion  in  Section  4.  The  definition  and  properties  closely  parallel  those  for  energy-dominated 
homogeneous  signals. 

Definition  4  A  dy-homogcneous  signal  x(t)  is  said  to  be  power-dominated  if  when  z(t)  is 
filtered  by  an  ideal  bandpass  filter  with  frequency  response  (9)  the  resulting  signal  io(t)  has 
finite  power,  i.e., 

The  notation  will  be  used  to  designate  the  class  of  power-dominated  homoge¬ 
neous  signals  of  degree  H .  Moreover,  while  our  definition  necessarUy  includes  the  energy- 
dominated  signals,  which  have  zero  power,  insofar  as  our  discussion  is  concerned  they  con¬ 
stitute  a  degenerate  case. 

Analogous  to  Theorem  2  for  the  energy-dominated  case,  we  can  establish  the  following 
theorem  describing  the  spectral  properties  of  power-dominated  homogeneous  signals. 

Theorem  5  Tf-Tien  a  power-dominated  homogeneous  signal  x{t)  is  filtered  by  an  ideal  band¬ 
pass  filter  with  frequency  response  (10),  the  resulting  signal  y{t)  has  finite  power  and  a 
power  spectrum  of  the  form 


lim 
T— *00 


5.(u;) 

0 


Wl  <  |u;|  < 
otherwise 


(28) 


where  5i(w)  is  some  function  that  is  independent  ofui  and  aqj  and  has  octave-spaced  ripple, 
i.e.,  for  all  integers  k, 

=  |2*u;p«+’5,(2"u>).  (29) 

The  details  of  the  proof  of  this  theorem  are  contained  in  Appendix  C,  although  it  is 
identical  in  style  to  the  proof  of  its  counterpart.  Theorem  2.  Note  that  since  Sx(u;)  in  the 
theorem  does  not  depend  on  ui  oz  uq;,  this  function  may  be  interpreted  as  the  generalized 
power  spectrum  of  x{t).  Furthermore,  the  relation  (29)  implies  that  signals  in  have  a 
generalized  time-averaged  power  spectrum  that  is  l//-like,  i.e., 


(30) 
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where,  via  (15),  7  =  2^  +  1. 

Theorem  5  directly  implies  that  a  homogeneous  signal  z(t)  is  power-dominated  if  and 
only  if  its  generating  sequence  q[n]  in  the  ideal  bandpass  wavelet  basis  has  finite  power,  i.e.. 


lim  - 

L— >00  2X  +  1 


53  ?^[n]  < 


Similarly  we  can  readily  deduce  from  the  results  of  Section  3.1  that,  in  fact,  for  any  ortho- 
normal  wavelet  basis  with  sufficiently  many  vanishing  moments  R  so  that  0  <  7  <  2iZ  —  1, 
the  generating  sequence  for  a  homogeneous  signal  of  degree  E  in  that  basis  has  finite  power 
if  and  only  if  the  signal  is  power-dominated.  This  implies  that  when  we  use  (25a)  with  such 
wavelets  to  synthesize  a  homogeneous  signal  z(t)  using  an  arbitrary  finite  power  sequence 
q[n],  we  are  assured  that  i(t)  €  Likewise,  when  we  use  (25b)  to  analyze  any  signal 
i(t)  €  P^,  we  are  assured  that  q[Ti]  has  finite  power. 


Remarks 

Energy-dominated  homogeneous  signals  of  arbitrary  degree  E  can  be  highly  regular,  at 
least  away  from  t  =  0.  In  contrast,  power-dominated  homogeneous  signals  typically  have 
a  fractal  structure  similar  to  the  statistically  self-similar  1//  processes  of  corresponding 
degree  P,  whose  power  spectra  are  also  of  the  form  (30)  with  7  =  2E  +  1.  One  might 
reasonably  conjecture  that  power-dominated  homogeneous  signals  and  1//  processes  of  the 
same  degree  also  have  identical  Hausdorff-Besicovitch  dimensions  [5],  when  defined.  Indeed, 
despite  their  obvious  structural  differences,  power- dominated  homogeneous  signals  and  1// 
processes  “look”  remarkably  similar  in  a  qualitative  sense.  This  is  apparent  in  Fig.  1,  where 
we  depict  the  sample  path  of  a  1//  process  along  side  a  power-dominated  homogeneous 
signal  of  the  same  degree  whose  generating  sequence  has  been  taken  from  a  white  random 
process.  We  stress  that  in  Fig.  l(a.),  the  self-similarity  of  the  1//  process  is  statistical, 
i.e.,  it  does  not  satisfy  (8)  but  its  autocorrelation  function  does.  In  Fig.  1(b),  the  self- 
similarity  of  the  homogeneous  signal  is  deterministic.  In  fact,  while  the  wavelet  coefficients 
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of  homogeneous  signals  axe  identical  from  scale  to  scale  to  within  an  amplitude  factor,  t.e., 

c  = 

the  wavelet  coefficients  of  1//  processes  have  only  the  same  second-order  statistics  from 
scale  to  scale  to  within  an  amplitude  factor,  t.e., 

for  some  function  p[n]  that  is  independent  of  m  [2]  [6]. 

Finally,  we  remark  that  not  all  power-dominated  homogeneous  signals  have  spectra  that 
are  bounded  on  r  <  w  <  27r.  An  interesting  subclass  of  power-dominated  homogeneous 
signals  with  such  unbounded  spectra  will,  in  fact,  arise  in  our  development  of  fractal  mod¬ 
ulation.  For  these  signals,  x{i)  as  defined  in  Definition  4  is  periodic,  so  we  refer  to  this  class 
of  power-dominated  homogeneous  signals  as  periodicity-dominated.  It  is  straightforward  to 
establish  that  these  homogeneous  signals  have  the  property  that  when  passed  through  an 
arbitrary  bandpass  filter  of  the  form  (10)  the  output  is  periodic  as  well.  Furthermore,  their 
power  spectra  consist  of  impulses  whose  areas  decay  according  to  a  l/|u>p  relationship.  An 
important  class  of  periodicity-donunated  homogeneous  signals  can  be  generated  through  a 
wavelet-based  synthesis  of  the  form  (16)  in  which  the  generating  sequence  q[n]  is  periodic. 

3.3  Discrete-Time  Algorithms  for  Processing  Homogeneous  Signals 

Orthonormal  wavelet  representations  provide  some  useful  insights  into  homogeneous  signals. 
For  instance,  because  the  sequence  q[n]  is  replicated  at  each  scale  in  the  representation  (16) 
of  a  homogeneous  signal  z(t),  the  detail  signals 

n 

representing  q[n]  modulated  into  a  particular  octave  band  axe  simply  time-dilated  versions 
of  one  another,  to  within  an  amplitude  factor.  The  corresponding  time-frequency  portrait  of 
a  homogeneous  signal  is  depicted  in  Fig.  2,  from  which  the  scaling  properties  are  apparent. 
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(a)  A  sample  fonction  of  a  1//  process. 


200  400  600  800  1000 


(b)  A  power-dominated  homogeneous  signal. 


Figure  1:  Comparison  between  the  sample  path  of  a  If  f  process  and  a  power-dominated 
homogeneous  signal.  Both  correspond  to  7  =  1  (i.e.,  E  =  OJ. 
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Figure  2:  The  time-frequency  portrait  of  a  homogeneous  signal  of  degree  H  =  —1/2. 


For  purposes  of  illustration,  the  signal  in  this  figure  has  degree  H  =  -1/2  (i.e.,  0  =  1), 
which  corresponds  to  the  case  in  which  g[n]  is  scaled  by  the  same  amplitude  factor  in  each 
octave  band.  Clearly,  the  partitioning  in  such  time-frequency  portraits  is  idealized:  in 
general,  there  is  both  spectral  and  temporal  overlap  between  cells. 

Wavelet  representations  also  lead  to  some  highly  efficient  algorithms  for  synthesizing, 
analyzing,  and  processing  homogeneous  signals  just  as  they  do  for  1//  processes  as  discussed 
in  [7].  The  signal  processing  structures  we  develop  in  this  section  axe  a  consequence  of 
applying  the  DWT  algorithm  to  the  highly  structured  form  of  the  wavelet  coefficients  of 
homogeneous  signals. 

We  have  already  encountered  one  discrete-time  representation  for  a  homogeneous  signal 
i(t),  namely  that  in  terms  of  a  generating  sequence  g[n]  which  corresponds  to  the  coefficients 
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of  the  expansion  of  2(t)  in  an  orthonormal  basis  for  .  When  the  €^{i)  are  derived 

from  a  wavelet  basis  according  to  (26),  another  useful  discrete-time  representation  for  x{t) 
is  a\’ailable,  which  we  now  discuss. 

Consider  the  coefficients  characterizing  the  resolution-limited  approximation 
of  a  homogeneous  signal  x{t)  with  respect  to  a  particular  wavelet-based  mtiltiresolution 
signal  analysis.  Since  these  coeffidents  are  the  projections  of  z(t)  onto  dilations  and  trans¬ 
lations  of  the  scaling  function  <6(t)  according  to  (5),  it  is  straightforward  to  verify  that  they, 
too,  are  identical  at  all  scales  to  within  an  amplitude  factor,  t.e., 

c  =  (31) 

Consequently,  the  sequence  is  an  alternative  discrete-time  characterization  of  x{i),  since 
knowledge  of  it  is  suffident  to  reconstruct  x{i)  to  arbitrary  accuracy.  For  convenience, 
we  refer  to  as  the  chamcteristic  sequence  and  denote  it  as  p{n].  As  is  true  for  the 
generating  sequence,  the  characteristic  sequence  assodated  with  x(t)  depends  upon  the 
particdar  multiresolution  analysis  used;  distinct  multiresolution  signal  analyses  generally 
yield  different  characteristic  sequences  for  any  given  homogenous  signal.  We  shall  require 
that  the  wavelet  associated  with  any  multiresolution  analysis  we  consider  have  sufficiently 
many  vanishing  moments  that  it  meets  the  conditions  of  Theorem  3. 

The  characteristic  sequence  p[n]  is  assodated  with  a  resolution-limited  approximation  to 
the  corresponding  homogeneous  signal  i(t).  Spedfically,  p[n]  represents  unit-rate  samples 
of  the  output  of  the  filter,  driven  by  2(t),  whose  frequency  response  is  the  complex  conjugate 
of  4>(u/).  Because  frequendes  in  the  neighborhood  of  the  spectral  origin,  where  the  spectrum 
of  x(t)  diverges,  are  passed  by  such  a  filter,  p[n]  will  often  have  infinite  energy  or,  worse, 
infinite  power,  even  when  the  generating  sequence  q[n]  has  finite  energy. 

The  characteristic  sequence  can,  in  fact,  be  viewed  as  a  discrete-time  homogeneous 
signal,  and  a  theory  can  be  developed  following  an  approach  directly  analogous  to  that 
used  in  Sections  3.1  and  3.2  for  the  case  of  continuous-time  homogeneous  signals.  The 
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Figure  3:  The  discrete-time  self-simiiarity  identity  for  a.  characteristic  sequence  p[n]. 
characteristic  sequence  satisfies  the  discrete-time  self-similaxitj’  relation^ 

=  (32) 

k 

which  is  readily  obtained  by  substituting  for  o™  in  the  DWT  analysis  equation  (6a)  using 
(31).  Indeed,  as  depicted  in  Fig.  3,  (32)  is  a  statement  that  when  p[n]  is  lowpass  filtered  with 
the  conjugate  filter  whose  unit-sample  response  is  h[-n]  and  then  downsampled,  we  recover 
an  amplitude-scaled  version  ofp[n].  Although  characteristic  sequences  are,  in  an  appropriate 
sense,  “generalized  sequences,"  when  highpass  filtered  with  the  corresponding  conjugate 
highpass  filter  whose  unit-sample  response  is  p[-u),  the  output  is  a  finite  energy  or  finite 
power  sequence,  depending  on  whether  p[n]  corresponds  to  a  homogeneous  signal  x(t)  that 
is  energy-dominated  or  power-dominated,  respectively.  Consequently,  we  can  analogously 
classify  the  sequence  p[n]  as  energy-dominated  in  the  former  case,  and  power-dominated  in 
the  latter  case.  In  fact,  when  the  output  of  such  a  highpass  filter  is  downsampled  at  rate 
two,  we  recover  the  characteristic  sequence  q[n]  associated  with  the  expansion  of  i(t)  in  the 
corresponding  wavelet  basis,  i.c., 

-  2n}plk].  (33) 

k 

This  can  be  readily  verified  by  substituting  for  aJJ*  and  in  the  DWT  analysis  equation 
(6b)  using  (31)  and  (14),  and  by  recognizing  that  cj*  =  p[n]  and  Xq  —  g[n]. 

From  a  different  perspective,  (33)  provides  a  convenient  mechanism  for  obtaining  the 
representation  for  a  homogeneous  signal  x{t)  in  terms  of  its  generating  sequence  f[n]  from 

’Relations  of  this  type  tn&y  be  considered  discrete-time  counterparts  of  the  dilation  equations  considered 
by  Strang  in  [8]. 
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one  in  terns  of  its  corresponding  characteristic  sequence  p[n],  i.e., 

p{„]  5[„]. 


To  obtain  the  reverse  mapping 

5l„)  — ,  p[n] 

is  less  straightforward.  For  an  arbitrary  sequence  q[n],  the  associated  characteristic  sequence 
p[n]  is  the  solution  to  the  linear  equation 

-  2k]p\k]  =  ~  ^k]q{k],  (34) 

k  k 

as  can  be  verified  by  specializing  the  DWT  synthesis  equation  (6c)  to  the  case  of  homoge¬ 
neous  signals.  There  a]^ears  to  be  no  direct  method  for  solving  this  equation.  However, 
the  DWT  synthesis  algorithm  suggests  a  convenient  and  efficient  iterative  algorithm  for 
constructing  pin]  from  y[n).  In  particular,  denoting  the  estimate  of  p[n]  on  the  tth  iteration 
by  p^‘^[n],  the  algorithm  is 

=  0  (35a) 

p!’+il[n]  =  {Mn  -  2fc]p*’'^[fc]  +  p[n  -  2%[fc]}  .  (35b) 

This  recursive  upsample- filter-merge  algorithm,  depicted  in  Fig.  4,  can  be  interpreted  as 
repeatedly  modulating  q[n]  with  the  appropriate  gain  into  successively  lower  octave  bands 
of  the  frequency  interval  0  <  lw|  <  r.  Note  that  the  precomputable  quantity 

9-W  = 

k 

represents  the  sequence  g[n]  modulated  into  essentially  the  upper  half  band  of  frequencies. 

Any  real  application  of  homogeneous  signals  can  ultimately  exploit  scaling  properties 
over  only  a  finite  range  of  scales,  so  that  it  suffices  in  practice  to  modulate  g[n]  into  a  finite 
range  of  contiguous  octave  bands.  Consequently,  only  a  finite  number  of  iterations  of  the 
algorithm  (35)  are  be  required.  More  generally,  this  also  means  that  many  of  the  theoretical 
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Figure  4;  Iterative  algoritbw  for  the  synthesis  of  the  characteristic  sequence  p[n]  of  a  ho¬ 
mogeneous  signal  x(i)  from  its  generating  sequence  g[n].  The  notation  p^'^(n]  denotes  the 
value  of  p(n]  at  the  ith  iteration. 

issues  associated  with  homogeneous  signais  concerning  singularities  and  convergence  do  not 
present  practical  difficulties  in  the  application  of  these  signals,  as  will  be  apparent  in  our 
developments  of  Section  4. 

Before  turning  to  a  potential  application  of  homogeneous  signal  sets,  we  mention  that 
there  would  appear  to  be  important  connections  to  be  explored  between  the  theory  of 
self-similar  signals  described  here  and  the  work  of  Barnsley,  et  al.,  [9]  on  deterministically 
self-affine  signals.  Interestingly,  the  recent  work  of  Malassenet  and  Mersereau  [10]  has 
shown  that  these  signals,  which  are  conveniently  generated  using  so-called  “iterated  function 
systems”  have  efficient  representations  in  terms  of  wavelet  bases  as  well. 

4  Fractal  Modulation 

In  this  section,  we  consider  the  use  of  homogeneous  signals  as  modulating  waveforms  in  a 
communications-based  context  as  an  example  of  the  direction  that  some  applications  may 
take.  Beginning  with  an  idealized  but  fairly  general  channel  model,  we  demonstrate  that  the 
use  of  homogeneous  waveforms  in  such  channels  is  at  least  natural,  if  not  optimal,  and  leads 
to  a  miiltirate  modulation  strategy  in  which  data  is  transmitted  simultaneously  at  multiple 
rates.  While  it  is  a  preliminary  proposal,  the  modulation  has  a  number  of  properties  that 
seem  appealing. 

Our  problem  involves  the  design  of  a  communication  system  for  transmitting  a  contin- 
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110T15-  or  discrete- \a]ue<^  data  sequence  over  a  noisy  and  unreliable  continuous-amplitude, 
continuous-time  channel.  We  must  therefore  design  a  modulator  at  the  transmitter  that 
embeds  the  data  sequence  q[n]  into  a  signal  x{i)  to  be  sent  over  the  channel.  At  the  receiver, 
a  demodulator  must  be  designed  for  processing  the  distorted  signal  r(t)  from  the  channel 
to  extract  an  optimal  estimate  of  the  data  sequence  fin]. 

In  a  typical  communication  scenario,  the  channel  would  be  “open”  for  some  time  interval 
T,  during  which  it  has  a  particular  bandwidth  W  and  signal-to-noise  ratio  (SNR).  Such  a 
channel  model  can  be  used  to  capture  both  characteristics  of  the  transmission  medium  and 
constraints  inherent  in  one  or  more  receivers.  When  the  noise  characteristics  are  additive, 
the  overall  channel  model  is  as  depicted  in  Fig.  5,  where  z{i)  represents  the  noise  process. 

When  either  the  bandwidth  or  duration  parameters  of  the  channel  are  known  a  priori, 
vherc  are  many  well-established  methodologies  for  designing  an  efficient  and  reliable  com¬ 
munication  system.  However,  we  shall  restrict  our  attention  to  the  case  in  which  both  the 
bandwidth  and  duration  parameters  are  ather  unknown  or  not  available  to  the  transmitter. 
This  case,  by  contrast,  has  received  comparatively  less  attention  in  the  communications 
literature,  although  it  encompasses  a  range  of  both  point-to-point  and  broadcast  commu¬ 
nication  scenarios  involving,  for  example,  jammed  and  fading  channels,  multiple  access 
channels,  covert  and  low  probability  of  intercept  (LPI)  communication,  and  broadcast  com¬ 
munication  to  disparate  receivers. 

We  shall  require  the  communication  system  we  design  for  such  channels  to  satisfy  the 
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following  performance  characteristics: 

1.  Given  a  duration-bandwidth  product  T  xW  that  exceeds  some  threshold,  we  must 
be  able  to  transmit  9[n]  without  error  in  the  absence  of  noise,  i.e.,  z(t)  =  0. 

2.  Given  increasing  duration-bandwidth  product  in  excess  of  this  threshold,  we  must  be 
able  to  transmit  g[n]  with  increasing  fidelity  in  the  presence  of  noise.  Furthermore, 
in  the  limit  of  infinite  duration-bandwidth  product,  perfect  transmission  should  be 
achievable  at  any  finite  SNR- 

The  first  of  these  requirements  implies  that,  at  least  in  principle,  we  ought  to  be  able  to 
recover  q[Ti]  using  arbitrarily  narrow  receiver  bandwidth  given  sufficient  duration,  or,  alter¬ 
natively,  from  an  arbitrarily  short  duration  segment  given  sufficient  bandwidth.  The  second 
requirement  implies  that  we  ought  to  be  able  to  obtain  better  estimates  of  q[n]  the  longer 
a  receiver  is  able  to  listen,  or  the  greater  the  bandwidth  it  has  available.  Consequently,  the 
modulation  must  contain  redundancy  of  a  type  that  can  be  exploited  for  the  purposes  of 
error  correction.  As  v;e  shall  demonstrate,  the  use  of  homogeneous  signals  for  transmission 
appears  to  be  rather  naturally  suited  to  fulfilling  both  these  system  requirements. 

The  minimum  achievable  duration-bandwidth  threshold  in  such  a  system  is  a  measure 
of  the  efficiency  of  the  modulation.  Actually,  because  the  duration-bandwidth  threshold 
r  X  is  a  function  of  the  length  L  of  the  data  sequence,  it  is  more  convenient  to  transform 
the  duration  constraint  T  into  a  symbol  rate  constraint  R  =  L/T  and  phrase  the  discussion 
in  terms  of  a  rate-bandwidth  threshold  R/W  that  is  independent  of  sequence  length.  Then, 
the  maximum  achievable  rate-bandwidth  threshold  constitutes  the  spectral  efficiency  of  the 
modulation,  which  we  shall  denote  by  q.  The  spectral  effidencj  of  a  transmission  scheme 
using  bandwidth  W  is,  in  fact,  defined  as 

where  Rmtx  is  the  maximum  rate  at  which  perfect  communication  is  possible  in  the  absence 
of  noise.  Hence,  the  higher  the  spectral  efficiency  of  a  scheme,  the  higher  the  rate  that 
can  be  achieved  for  a  given  bandwidth,  or,  equivalently,  the  smaller  the  bandwidth  that  is 
required  to  support  a  given  rate. 
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When  the  available  channel  bandwidth  is  known  c  priori,  a  reasonably  spectrally  effi¬ 
cient,  if  impractical,  modtilation  of  a  data  sequence  qfn]  involves  expanding  the  sequence 
in  terms  of  an  ideally  bandlimited  orthonormal  basis.  Specifically,  with  Wo  denoting  the 
channel  bandwidth,  a  transmitter  produces 


^(0  =  sine  (Wot  —  n) 

n 


where 


sine  (i)  =  < 


1 

sin  Ttt 

Vi 


t  =  0 
otherwise 


In  the  absence  of  noise,  a  receiver  may  recover  ^[n]  from  the  projections 


9(71]=  f  x(t)  \/Wo  sine  (Wot  -  n)di 
•/  — 00 

which  can  be  implemented  as  a  sequence  of  filter-and-sample  operations.  Since  this  scheme 
achieves  a  rate  of  iZ  =  Wo  symbols/sec  using  the  double-sided  bandwidth  of  W  =  Wb  Hz, 
it  is  characterized  by  a  spectral  efficiency  of 


T)o  =  1  symbol/sec/Hz. 


(36) 


However,  because  the  transmitter  is  assumed  to  have  perfect  knowledge  of  the  rate- 
bandwidth  characteristics  of  the  channel,  this  modulation  does  not  constitute  a  viable  so¬ 
lution  to  our  communications  problem.  Indeed,  in  order  to  accommodate  a  decrease  in 
available  channel  bandwidth,  the  transmitter  would  have  to  be  accordingly  reconfigured  by 
decreasing  the  parameter  Wq.  Similarly,  for  the  system  to  maintain  a  spectral  efficiency  of 
jjo  =  1  when  the  available  channel  bandwidth  increases,  the  transmitter  must  be  reconfig¬ 
ured  by  correspondingly  increasing  the  parameter  Wd.  Nevertheless,  while  not  a  solution 
to  our  communications  problem,  this  benchmark  modulation  provides  a  useful  performance 
baseline  in  evaluating  the  fractal  modulation  strategj’  we  develop. 

We  now  turn  our  attention  to  the  problem  of  designing  a  modulation  strategy’  that 
maintains  its  spectral  efficiency  over  a  broad  range  of  rate-bandwidth  combinatioii.  using 
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a  fixed  transmitter  configuration.  A  rather  natural  solution  to  this  problem  arises  out  of 
the  concept  of  embedding  the  data  to  be  transmitted  into  a  homogeneous  signal.  Due  to 
the  fractal  properties  of  the  transmitted  signals,  we  refer  to  the  resulting  scheme  as  “fractal 
modulation.” 

4.1  Transmitter  Design;  Modulation 

To  embed  a  finite-power  sequence  qfn]  into  a  dy-homogeneous  waveform  x[i)  of  degree  .ff ,  it 
suffices  to  consider  using  q[n]  as  the  coefficients  of  an  expansion  in  terms  of  a  wavelet-based 
orthonormal  self-similar  basis  of  degree  E  ^  t.c., 

n 

where  the  basis  functions  are  constructed  according  to  (26).  When  the  basis  is  derived 
from  the  ideal  bandpass  wavelet,  as  we  shall  generally  assume  in  our  analysis,  the  resulting 
waveform  x{t)  is  a  power-dominated  homogeneous  signal  whose  idealized  time-frequency 
portrait  has  the  form  depicted  in  Fig.  2.  Consequently,  we  may  view  this  as  a  muliirate 
modulation  of  g[n]  where  in  the  mth  frequency  band  g[n]  is  modulated  at  rate  2"*  using  a 
double-sided  bandwidth  of  2"*  Hz.  Furthermore,  the  energy  per  symbol  used  in  successively 
higher  bands  scales  by  0  =  2^^'''^  Using  a  suitably  designed  receiver,  q[n]  can,  in  principle, 
be  recovered  from  x(t)  at  an  arbitrary  rate  2”*  using  a  baseband  bandwidth  of  2’”'^^  Hz. 
Consequently,  this  modulation  has  a  spectral  efficiency  of 

Tjp  =  (1/2)  symbol/sec/Hz. 

We  emphasize  that  in  accordance  with  our  channel  model  of  Fig.  5,  it  is  the  baseband 
bandwidth  that  is  important  in  defining  the  spectral  efficiency  since  it  defines  the  highest 
frequency  available  at  the  recaver. 

While  the  spectral  efficiency  of  this  modulation  is  half  that  of  the  benchmark  scheme 
(36),  this  loss  in  efficiency  is,  in  effect,  the  price  paid  to  enable  a  receiver  to  use  any  of 
a  range  of  rate-bandwidth  combinations  in  demodulating  the  data.  Fig.  6  illustrates  the 
rate-bandwidth  tradeoffs  available  to  the  receiver.  In  the  absence  of  noise  the  receiver  can, 
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in  prindple,  perfectly  recover  ^[n]  using  rate-bandwidth  combinations  lying  on  or  below  the 
solid  curve.  The  stepped  character  of  this  curve  reflects  the  fact  that  only  rates  of  the  form 
2*"  can  be  accommodated,  and  that  full  octave  increases  in  bandwidth  axe  required  to  enable 
^[n]  to  be  demodulated  at  successively  higher  rates.  For  reference,  the  performance  of  our 
benchmark  modulation  is  superimposed  on  this  plot  using  a  dashed  line.  We  emphasize 
that  in  contrast  to  fractal  modulation,  the  transmitter  in  the  benchmark  scheme  requires 
perfect  knowledge  of  the  rate-bandwidth  characteristics  of  the  channel. 

Although  it  considerably  simplifies  our  analysis,  the  use  of  the  ideal  bandpass  wavelet 
to  synthesize  the  orthonormal  self-similar  basis  in  our  modulation  strategy  is  impractical 
due  to  the  poor  temporal  localization  in  this  wavelet.  However,  we  may,  in  practice,  re¬ 
place  the  ideal  bandpass  wavelet  with  one  having  not  only  comparable  frequency  domain 
characteristics  and  better  temporal  localization,  but  sufficiently  many  vanishing  moments 
to  ensure  that  the  transmitted  waveform  is  power- dominated  as  well.  Fortunately,  there  are 
many  suitable  wavelets  from  which  to  choose,  among  which  are  those  due  to  Daubechies  [4]. 
WTien  such  wavelets  are  used,  the  exact  spectral  efficiency  of  the  modulation  depends  on  the 
particular  definition  of  bandwidth  employed.  Nevertheless,  using  any  reasonable  definition 
of  bandwidth,  we  would  expect  to  be  able  to  achieve,  in  practice,  a  spectral  efficiency  close 
to  (1/2)  symbols/sec/Hz  with  this  modulation,  and,  as  a  result,  we  shall  assume  rjp  a;  1/2 
in  subsequent  analysis. 

Another  apparent  problem  with  fractal  modulation  as  initially  proposed  is  that  it  re¬ 
quires  infinite  transmitter  power.  Indeed,  as  Fig.  2  illustrates,  q[n]  is  modulated  into  an 
infinite  number  of  octave-width  frequency  bands.  However,  in  a  practical  implementation, 
only  a  finite  collection  of  contiguous  bands  M  would,  in  fact,  be  used  by  the  transmitter. 
As  a  resxilt,  the  transmitted  waveform 

=  E  (37) 

would  exhibit  self-similarity  only  over  a  range  of  scales,  and  demodulation  of  the  data  would 
be  possible  at  one  of  only  a  finite  number  of  rates.  In  terms  of  Fig.  6,  the  rate-bandwidth 
characteristic  of  the  modulation  would  extend  over  a  finite  range  of  bandwidths  chosen  to 
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Figure  6:  SpectraJ  efficiency  of  fractal  woduiation.  At  each  bandwidth  B,  tie  solid  curve 
indicates  tie  maximum  rate  at  which  transmitted  data  can  be  perfectly  recovered  in  the  ab¬ 
sence  of  noise.  The  dashed  curve  indicates  the  corresponding  performance  of  the  benchmark 
scheme. 
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cover  extremes  anticipated  for  the  system. 

The  fractal  modulation  transmitter  can  be  implemented  in  a  computationally  highly 
efficient  manner,  since  much  of  the  processing  can  be  performed  using  the  discrete-time 
algorithms  of  Section  3.3.  For  example,  synthesizing  the  waveform  i(t)  given  by  (37)  for 
Af  =  {0, 1,  . . . ,  Af  —  1}  involves  two  stages.  In  the  first  stage,  which  involves  only  discrete¬ 
time  processing,  ^[n]  is  mapped  into  M  consecutive  octave-width  frequency  bands  to  obtain 
the  sequence  p^^[n].  This  sequence  is  obtained  using  M  iterations  of  the  63mthesis  algorithm 
(35)  wth  the  QMF  filter  pair  /i[n],p[n]  appropriate  to  the  wavelet  basis.  The  second  stage 
then  consists  of  a  discrete-  to  continouous-time  transformation  in  which  is  modulated 

into  the  continuous- time  frequency  spectrum  via  the  appropriate  scaling  function  according 
to 

-  n). 

n  n 

It  is  important  to  point  out  that  because  a  batch-iterative  algorithm  is  employed,  potentially 
large  amounts  of  data  buffering  may  be  required.  Hence,  while  the  algorithm  may  be 
computationally  efficient,  it  may  be  considerably  less  so  in  terms  of  storage  requirements. 
However,  in  the  event  that  q[n]  is  finite  length,  it  is  conceivable  that  the  algorithm  may  be 
modified  so  as  to  be  memory-efficient  as  well.  Such  potential  remains  to  be  explored. 

The  transmission  of  finite  length  sequences  using  fractal  modulation  more  generally 
raises  a  variety  of  issues  and,  therefore,  requires  some  special  consideration.  In  fact,  as 
initially  proposed,  fractal  modulation  is  rather  inefficient  in  this  case,  in  essence  because 
successively  higher  frequency  bands  are  increasingly  underutilized.  In  particular,  we  note 
from  the  time-frequency  portrait  in  Fig.  2  that  if  q[n]  hats  finite  length,  e.g., 

q[n]  =  0,  fi  <  0,  n  >  X  -  1, 

then  the  mth  band  will  complete  its  transmission  of  g[n]  and  go  idle  in  half  the  time  it  takes 
the  (m  -  l)st  band,  and  so  forth.  However,  finite  length  messages  may  be  accommodated 
rather  naturally  and  efficiently  by  modulating  their  periodic  extensions  q[n  mod  L]  thereby 
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generating  a  transmitted  waveform 

n 

which  constitutes  a  periodicity-dominated  homogeneous  signal  of  the  type  discussed  in 
Section  3.2.  If  we  let 

q  =  {9[0]  ?(1]  qlL-  1]) 

denote  the  data  vector,  then  the  time- frequency  portrait  associated  with  this  signal  is  shown 
in  Fig.  7.  Using  this  enhancement  of  fractal  modulation,  we  not  only  maintain  our  ability  to 
make  various  rate-bandwidth  tradeoffs  at  the  receiver,  but  we  acquire  a  certain  flexibility  in 
our  choice  of  time  origin  as  well.  Specifically,  as  is  apparent  from  Fig.  7,  the  receiver  need 
not  begin  demodulating  the  data  at  t  =  0,  but  may  more  generally  choose  a  time-origin 
that  is  some  multiple  of  LR  when  operating  at  rate  R.  Additionally,  this  strategy  can,  in 
principle,  be  extended  to  accommodate  data  transmission  on  a  block-by-block  basis. 

The  final  aspect  of  fractal  modulation  that  remains  to  be  considered  in  this  section 
concerns  the  spedfication  of  the  parameter  R.  While  H  has  no  effect  on  the  spectral 
efficiency  of  fractal  modulation,  it  does  affect  the  power  eflficiency  of  the  scheme.  Indeed, 
it  controls  the  relative  power  distribution  between  frequency  bands  and,  hence,  the  overall 
transmitted  power  spectrum,  which  takes  the  form  (30)  where  7  =  2E  -f- 1.  Consequently, 
the  selection  of  H  is  important  when  we  consider  the  presence  of  additive  noise  in  the 
channel. 

For  traditional  additive  stationary  Gaussian  noise  channels  of  known  bandwidth,  the 
appropriate  spectral  shaping  of  the  transmitted  signal  is  governed  by  a  “water-filling”  pro¬ 
cedure  [11]  [12]  which  is  also  the  method  by  which  the  capacity  of  such  channels  is  computed 
[13].  Using  this  procedure,  the  available  signal  power  is  distributed  in  such  a  way  that  pro¬ 
portionally  more  power  is  located  at  frequencies  where  the  noise  power  is  smaller. 

When  there  is  uncertainty  in  the  available  bandwidth,  the  water-filling  approach  leads 
to  poor  worst-case  performance.  As  an  example,  for  a  channel  in  which  the  noise  power  is 
very  small  only  in  some  fixed  frequency  band  0  <  <  ojy  <  cx),  the  water-filling  recipe 

will  locate  the  signal  power  predominantly  within  this  band.  As  a  result,  the  overall  SNR 
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Figure  7:  A  portion  of  the  time-frequency  portrait  of  the  transmitted  signal  for  fractal 
modulation  of  a  Unite-length  data  vector  q.  The  case  H  =  -1/2  is  shown  for  convenience. 
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in  the  channel  will  strongly  depend  on  whether  the  channel  bandwidth  is  such  tha,t  these 
frequencies  are  passed.  By  contrast,  the  distribution  of  power  according  to  a  spectral- 
matching  rule  that  maintains  an  SNR  that  is  independent  of  frequency  leads  to  a  system 
whose  performance  is  uniform  with  variations  in  bandwidth  and,  In  addition,  is  potentially 
well-suited  for  LPI  communication.  Since  power-dominated  homogeneous  signals  have  a 
power  spectrum  of  the  form  of  (30),  the  spectral- matching  rule  suggests  that  fractal  mod¬ 
ulation  may  be  naturally  suited  to  channels  with  additive  1//  noise  whose  degree  B  is  the 
same  as  that  of  the  transmitted  signal.  This  rather  broad  class  of  statistically  self-similar 
processes  includes  not  only  classical  white  Gaussian  noise  {B  =  —1/2)  and  Brownian  mo¬ 
tion  {B  =  1/2),  but,  more  generally,  a  range  of  rather  prevalent  nonstatioaary  noises  which 
exhibit  strong  long-term  statistical  dependence  [14]. 

In  this  section,  we  have  developed  a  modulation  strategy  that  satisfies  the  first  of  the 
two  system  requirements  described  at  the  outset  of  Section  4.  In  the  next  section,  where 
we  turn  our  attention  to  the  problem  of  designing  optimal  receivers  for  fractal  modulation, 
we  shall  see  that  fractal  modulation  also  satisfies  the  second  of  our  system  requirements. 

4.2  Receiver  Design:  Demodulation 

Consider  the  problem  of  recovering  a  finite  length  message  q[n]  from  band-limited,  time- 
limited,  and  noisy  observations  r{t)  of  the  transmitted  waveform  x{i)  consistent  with  our 
channel  model  of  Fig.  5.  We  shall  assume  that  the  noise  z{t)  is  a  Gaussian  1//  process  of 
degree  Bj  =  B ,  and  that  the  degree  Bx  of  the  homogeneous  signal  i(t)  has  been  chosen 
according  to  our  spectral-matching  rule,  i.e., 

Bx  =  B^  =  B.  (38) 

We  remark  that  if  it  is  necessary  that  the  transmitter  measure  Bx  in  order  to  perform 
this  spectral  matching,  the  robust  and  efficient  parameter  estimation  algorithms  for  1// 
processes  developed  in  [7]  may  be  exploited. 

Depending  on  the  nature  of  the  message  being  transmitted,  there  are  a  variety  of  dif¬ 
ferent  optimization  criteria  from  which  to  choose  in  designing  a  suitable  receiver.  As  a 
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representative  example,  we  consider  the  case  in  which  the  transmitted  message  is  a  random 
bit  stream  of  length  L  represented  by  a  binary- valued  sequence 

qln]  € 

where  Eq  is  the  energy  per  bit.  For  this  data,  we  develop  a  receiver  that  demodulates  qjn] 
so  as  to  minimize  the  probability  of  a  bit-error.  Demodulation  of  non-binary  discrete- valued 
sequences  is  achieved  using  a  straightforward  extension  of  our  results,  and  demodulation  of 
continuous- valued  sequences  under  a  minimum  mean-square  error  criterion  is  described  in 
(2]. 

An  efficient  implementation  of  the  optimum  receiver  processes  the  observations  r{t)  in 
the  wavelet  domain  by  first  extracting  the  wavelet  coefficients  r”  using  the  DWT  (6).  These 
coefficients  take  the  form 

C  =  mod  L]  -h  2™  (39) 

where  the  the  wavelet  coefficients  of  the  noise  process,  and  where  we  have  assumed 

that  in  accordance  with  our  discussion  in  Section  4.1  the  periodic  replication  of  the  finite 
length  sequence  q[n]  has  been  modulated.  To  simplify  our  analysis,  we  shall  further  assume 
that  the  ideal  bandpass  wavelet  is  used  in  the  transmitter  and  receiver,  although  we  reiterate 
that  comparable  performance  can  be  achieved  when  more  practical  wavelets  are  used. 

The  duration-bandwidth  characteristics  of  the  channel  will  in  general  affect  which  obser¬ 
vation  coefficients  rjj’  may  be  accessed.  In  particular,  if  the  channel  is  bandlimited  to  2^^ 
Hz  for  some  integer  Mu,  this  precludes  access  to  the  coefficients  at  scales  corresponding 
to  m  >  Mu-  Simultaneously,  the  duration-constraint  in  the  channel  results  in  a  miniTTmm 
allowable  decoding  rate  of  2^^  symbols/sec  for  some  integer  Ml,  which  precludes  access  to 
the  coefficients  at  scales  corresponding  to  m  <  Ml-  As  a  result,  the  coDection  of  coefficients 
available  at  the  receiver  is 

r  =  {r",m  6  M,n  €  A/^Cm)} 

where 


M  =  {Ml,Ml-\- 1,. .  .,,Mu) 
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M(m)  = 


This  means  that  we  have  available 


Mu 

^m—Mj,  _  J 

m=A#i 


(40) 


noisy  measurements  of  each  of  the  L  non-zero  samples  of  the  sequence  qfn].  The  specific 
relationship  between  decoding  rate  R,  bandwidth  W,  and  redundancy  K  can,  therefore,  be 
expressed  in  terms  of  the  spectral  efficiency  of  the  modulation  as 


A  - 

W  K  +  l* 


(41) 


where,  as  discussed  earlier,  rjp  a  1/2.  Note  that  My  =  Ml  when  K  =  1,  and  (41)  attains 
its  maximum  value,  rip. 

The  optimal  decoding  of  each  bit  can  be  described  in  terms  of  a  binary  hypothesis 
tes+  on  the  set  of  available  observation  coefficients  r.  Denoting  by  E\  the  hypothesis  in 
which  q{n]  -  and  by  So  the  hypothesis  in  which  q[n]  =  —v/Z^»  we  may  construct 

the  likelihood  ratio  test  for  the  optimal  decoding  of  each  symbol  g[n].  The  derivation 
is  particularly  straightforward  because  of  the  fact  that,  in  accordance  with  the  wavelet- 
based  models  for  1//  processes  developed  in  [15]  [7]  [2],  the  rJJ*  in  (39)  may  be  modeled  as 
independent  zero-mean  Gaussian  random  variables  with  variances 


Varz;r  =  ‘^,/? 


7a-m 


(42) 


for  some  varizmce  parameter  aj  >  0.  Consequently,  the  likelihood  ratio  test  reduces  to  the 
test 


Hi 


Mu  m  . 

/_  T  >  0 


m=Mi  1=0 


under  the  assumption  of  equally  likely  hypotheses,  i.e.,  a  random  bit  stream.  The  bit-error 
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probability  associated  with  this  optimal  receiver  is  readily  derived,  and  can  be  expressed  as 


Pr(£)  =  Pr(^  >  0\Ho)  = 


(43) 


where  Q(-)  is  defined  by 

and  where  cl  is  the  SNR  in  the  channel,  t.e. 


Substituting  for  K  in  (43)  via  (41)  we  can  rewrite  this  error  probability  in  terms  of  the 
channel  rate-bandwidth  ratio  as 


Pr(£)  =  Q 


(44) 


where,  again,  rjf  «  1/2.  Note  that  the  performance  of  fractal  modulation  is  independent  of 
the  spectral  exponent  of  the  noise  process  when  we  use  spectral  matching. 

To  establish  a  performance  baseline,  we  shall  also  evaluate  a  modified  version  of  our 
benchmark  modulation  in  which  we  incorporate  repetition-coding,  i.c.,  in  which  we  add 
redundancy  by  transmitting  each  sample  of  the  message  sequence  K  times  in  succession. 
This  comparison  scheme  is  not  particularly  power  efficient  both  because  signal  power  is 
distributed  uniformly  over  the  available  bandwidth  irrespective  of  the  noise  spectrum,  and 
because  much  more  effective  redundancy  schemes  can  be  used  with  channels  of  known 
bandwidth  (see,  e.g.,  [16]).  Nevertheless,  with  these  caveats  in  mind,  such  comparisons  do 
lend  some  insight  into  the  relative  power  efficiency  of  fractal  modulation. 

In  our  modified  benchmark  modulation,  incorporating  redundancy  reduces  the  effective 
decoding  rate  per  unit  bandwidth  by  a  factor  of  K,  i.c.. 


^  _Vo 
W  ~  K’ 


(45) 


where  qo  is  the  efficiency  of  the  modulation  without  coding,  i.c.,  unity.  When  the  channel 
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adds  Stationary  white  Gaussian  noise,  for  which  H  =  -1/2,  the  optimum  receiver  for  this 
scheme  demodulates  the  received  data  and  averages  together  the  K  symbols  associated  with 
the  transmitted  bit,  thereby  generating  a  sufficient  statistic.  When  this  statistic  is  positive, 
the  receiver  decodes  a  1-bit,  and  a  0-bit  otherwise.  The  corresponding  performance  is, 
therefore,  given  by 


where  the  last  equality  results  from  substituting  for  K  via  (45). 

Comparing  (46)  with  (44),  we  note  that  since  %  »  the  asymptotic  bit-error  per¬ 
formances  of  fractal  modulation  and  the  benchmark  scheme  are  effectively  equivalent  for 
RJW  <  T]f,  as  is  illustrated  in  Fig.  8.  In  Fig.  8(a),  Pr(c)  is  shown  as  a  function  of  R/W 
at  a  fixed  SNR  of  0  dB  {cl  -  1),  while  in  Fig.  8(b),  Pr(c)  is  shown  as  a  function  of  SNR 
at  a  fixed  RjW  =  0.1  bits/sec/Hz.  Both  these  plots  reveal  strong  thresholding  behavior 
whereby  the  error  probability  falls  off  dramatically  at  high  SNR  and  low  RJW.  We  empha¬ 
size  that  comparisons  between  the  two  schemes  are  appropriate  only  for  the  case  in  which 
the  noise  has  parameter  H  =  — 1/2,  corresponding  to  the  case  of  stationary  white  Gaussian 
noise.  For  other  values  of  H,  the  performance  of  the  benchmark  modulation  is  not  only 
difficult  to  evaluate,  but  necessarily  poor  as  well  because  of  inefficient  distribution  of  power 
among  frequencies. 

5  Concluding  Comments 

We  have  developed  convenient,  efficient,  and  robust  wavelet-based  representations  for  a 
generalized  class  of  homogeneous  signals,  and  explored  their  properties.  Furthermore,  we 
have  explored  their  potential  for  use  as  modulating  waveforms  in  a  communications-based 
application,  and  demonstrated  that  fractal  modulation  would  appear  to  be  well-suited  for 
use  with  noisy  channels  of  simultaneously  uncertain  duration  and  bandwidth. 

While  our  development  of  fractal  modulation  considered  many  issues,  many  others,  such 
as  synchronization  and  buffering,  remain  to  be  investigated.  Furthermore,  there  are  many 
potential  refinements  to  be  explored.  One  might  involve  the  incorporation  of  block  or  trellis 
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10"^  10"^  ICr^  10° 

RfW  (symbols/sec/Hz) 


(a)  Bit-error  probability  Pr(e)  as  a  function  of  Rate/Eandwidt'n  ratio  R/W  at  0  dB  SKR. 

Figure  8:  Bit-error  rate  performance  of  fractal  modulation.  Solid  lines  indicate  the  perfor¬ 
mance  of  fractal  modulation,  v  tile  dashed  lines  indicate  tie  performance  of  tie  benchmark 
modulation  with  repetition  coding. 
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SNR  (dB) 


(b)  Bil-error  probability  Pr(£)  as  a  fvnctioD  of  SNR  at  R/W  =  0.1  symbols/Bcc/Hz. 


Figure  8:  Continued. 
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coding  techniques  to  improve  the  power  efficiency  of  the  modulation.  It  would  seem  that 
coding  of  this  type  cannot  be  incorporated  without  sacrihcing  properties  of  the  transmission 
scheme.  In  particular,  the  simple  redundancy  scheme  apparent  in  Fig.  7  enables  the  recovery 
of  a  message  q  from  observations  corresponding  to  any  single  ceU  of  the  time-frequency 
plane.  Nevertheless,  it  would  be  important  to  identify  the  tradeoffs  involved. 

The  potential  of  fractal  modulation  in  LPl  applications  also  remains  to  be  explored. 
While  we  have  argued  that  the  second-order  statistics  of  homogeneous  signals  are  effectively 
indistinguishable  from  those  of  1  //  noises,  a  more  comprehensive  study  of  the  detectability 
of  homogeneous  signals  is  warranted.  In  the  process,  some  potentially  useful  extensions  to 
fractal  modulation  may  arise.  As  an  example,  drawing  from  the  notions  underlying  direct- 
sequence  spread  spectrum,  one  technique  for  more  effectively  concealing  the  modulation 
from  unintended  receivers  might  involve  premultiplying  the  entire  wavelet  coefficient  field 
i"  of  the  signal  i(t)  prior  to  transmission  by  a  pseudorandom  bit  field  known  to  both 
transmitter  and  receiver. 

Finally,  we  remark  that  there  wodd  appear  to  be  many  additional  applications  for  the 
self-similar  signals  we  have  introduced  in  this  paper.  In  many  respects,  identifying  and 
exploring  other  potentially  promising  applications  represents  perhaps  the  most  exciting 
direction  for  future  research. 
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A  Proof  of  Theorem  2 

To  show  that  y(t)  has  finite  energj',  we  exploit  an  equivalent  synthesis  for  y(t)  as  the  output 
of  a  cascade  of  filters  driven  by  x(t),  the  first  of  which  is  an  ideal  bandpass  filter  whose 
passband  includes  cji  <  jw]  <  wy,  and  the  second  of  which  is  the  filter  given  by  (10). 
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Let  bm{t)  b'  the  impulse  response  of  a  filter  whose  frequency  response  is  ^ven  by 


£m(w)  = 


1  2’"jr  <  |w|  <  2"*+^r- 

0  otherwise 


and  let  b{t)  be  the  impulse  response  corresponding  to  (10).  Fiirthermore,  choose  hnite 
integers  Ml  and  Mu  such  that  2*^^7r  <  and  uy  <  2^^'^’r.  Then,  using  *  to  denote 
convolution, 


where 


=  Ki)* 

m^Mi, 

Mu 

=  m-  E  in,(i) 

m=Mi 


«m(t)  =  =  2"”'^io(2"*t), 


and  where  the  last  equality  in  (49)  results  from  an  application  of  the  self-similarity  relation 
(8)  and  the  identity 

bm{i)  =  2’"6o(2"‘f). 

Because  x{i)  is  energy-dominated,  io(0  has  finite  energy.  Hence,  (49)  implies  that  every 
Xrn{t)  has  finite  energy.  Exploiting  this  fact  in  (48)  allows  us  to  conclude  that  y{i)  must 
have  finite  energy  as  well. 

To  verify  the  spectrum  relation  (11),  we  express  (48)  in  the  Fourier  domain.  Exploiting 
the  fact  that  we  may  arbitrarily  extend  the  limits  in  the  summation  in  (48),  we  get 


X{u)  UL  <  |w|  <  Uy 
0  otherwise 


where  X„{uj)  denotes  the  Fourier  transform  of  Xm{t),  and  where 


A'(u,)i  2 
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The  right-hand  side  of  (50)  is,  of  course,  pointwise  convergent  because  for  each  u  at  most 
one  term  in  the  sum  is  non-zero.  Finally,  exploiting  (49)  in  (50)  gives 

A'(w)  =  5;2-”‘<^+^>;^o(2'’"w), 

m 

which,  as  one  can  readily  verify,  satisfies  (12)  ■ 

B  Proof  of  Theorem  3 

To  prove  the  “only  if”  statement,  we  suppose  x{t)  €  and  begin  by  expressing  x(i)  in 
terms  of  the  ideal  bandpass  wavelet  basis.  In  particular,  we  let 

m 

where 

n 

and  where  g[n],  the  generating  sequence  in  this  basis,  has  energy  E  <  oo.  The  new  gener¬ 
ating  sequence  g[Ti]  can  then  be  expressed  as 

9M  =  'Egmln]  (51) 

m 

where 

9m[^]  =  ym(01t=T» 

and 

=  im(0  *  ^>(-0* 

For  each  m,  since  im(0  is  bandlimited,  and  gm[>4]  each  have  finite  energ}-  and  Fourier 
transforms  ym(i^)  and  Qm(^)  respectively.  Hence, 

Qm(u^)  =  ^yUu>-27rk) 

k 


(52) 
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where 

2"‘7r  <  lu;|  <  2'"+^^ 

5m(w)  = 

0  otherwise 

with  Q{u)  denoting  the  Fourier  transform  of  9[n],  and  4'*(w)  the  complex  conjugate  of 

9'(w). 

In  deriving  bounds  on  'he  energ>’  Em  in  each  sequence  qm!"]  for  a  fixed  m,  it  is  convenient 
to  consider  the  cases  m  <  —  1  and  m  >  0  separately.  When  m  <  —  1,  the  sampling  by  which 
is  obtained  involves  no  aliasing.  Since  on  Iw]  <  jr  we  then  have 

Qm(w)  =  VmCw) 

we  may  deduce  that  9m  [’ll  has  energy 

£„  =  El?„lnl|’  =  (53) 

„  r 

Because  has  R  vanishing  moments,  there  exists  a  0  <  to  <  «>  such  that 

l^(w)l  <  folw]^  (54) 

for  all  u;.  Exploiting  this  in  (53)  we  obtain 

Em  <  (55) 

for  some  0  <  Co  <  oo. 

Consider,  next,  the  case  corresponding  to  m  >  0.  Since  ^{t)  has  R  vanishing  moments, 
there  also  exists  a  0  <  tj  <  oo  such  that 

l«'(‘^)!<fiM~^  (56) 

for  ah  u.  Hence,  on  2”*^  <  |u;|  <  2’^"’'^7r, 

iym(u;)l  <  ta  7r-«  2-(^+»+2R)m/2  |(5(2-’"u>)l.  (57) 
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From  (52),  we  obtain 


2">_1 


|Qm(‘*')l  <  €i  ^  |(j(2-’"a.  +  27rifc2-”*)| 


icsO 


(58) 


by  exploiting,  in  order,  the  triangle  inequality,  the  bound  (57),  the  fact  that  only  2*"  terms 
in  the  summation  in  (52)  are  non-zero  since  ym(0  bandlimited,  and  the  fact  that  Q{u) 
is  27r-periodic.  In  turn,  we  may  use,  in  order,  (58),  the  Schwarz  inequality,  and  again  the 
periodicity  of  Qiuj)  to  conclude  that 


Em  <  ^  |Q(2-">u>-|-2:rlb2-’")|2dw 


-.2 


(59) 


for  some  0  <  Cj  <  oo. 

Using  (51),  the  triangle  inequality,  and  the  Schwarz  inequality,  we  obtain  the  following 
bound  on  the  energy  in  S'fn] 

l2 

£  =  Ei5Wi’< 

n  L 

which,  from  (59)  and  (55)  is  finite  provided  0  <  7  <  2^  and  iE  >  1. 

Let  us  now  show  the  converse.  Suppose  q{n]  has  energy  <  00,  and  express  i(t)  as 

a:(0  =  2I*m(0 

m 

where 

r™(i)  =  r""E«W''r(')- 

n 

If  we  let 

Vmit)  =  6o(0*3:,„(t) 

where  bo{t)  is  the  impulse  response  of  the  ideal  bandpass  filter  in  Definition  1,  it  suffices  to 
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show  that 

=  (60) 
m 

has  finite  energy. 

For  each  m,  we  begin  by  bounding  the  energy  in  »m(<)t  which  is  finite  because  Xm(0 
has  finite  energy.  Since  ym(0  Fourier  transform 

{(2/3)-"‘/2<5(2-"‘u,)  9r(2-"‘w)  jr<<<2jr 
0  otherwise 

where  Q(u»)  is  the  discrete-lime  Fourier  transform  of  5[n],  we  get  that 

o-TTfri 

£„  =  i—  /  |C(a.)|=  U'(2-”u)|=  iw. 

Again,  it  is  convenient  to  consider  the  cases  corresponding  to  m  <  -1  and  m  >  0  separately. 
For  m  <  —  1,  most  of  the  energy  in  Zm(t)  is  at  frequencies  below  the  passband  of  the 
bandpass  filler.  Hence,  using  the  bound  (56)  and  exploiting  the  periodicity  of  Q(w)  we 
obtain 

Em  <  (61) 

for  some  0  <  Co  <  oo.  For  m  >  0,  most  of  the  energy  in  im(0  is  at  frequencies  higher  than 
the  passband  of  the  bandpass  filter.  Hence,  using  the  bound  (54)  we  obtain 

Em  <  Ci2-<^+2^+^)’"f;.  (62) 


for  some  0  <  Cj  <  oo. 

Finally,  using  (60),  the  triangle  inequality,  and  the  Schwarz  inequality,  we  obtain  the 
following  bound  on  the  energj’  in  y{t) 


which,  from  (62)  and  (61)  is  finite  provided  0  <  7  <  2iZ  —  1  since  J?  >  1  ■ 
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C  Proof  of  Theorem  5 


Following  an  approach  analogous  to  the  proof  of  Theorem  2,  let  fcm(0  impulse 

response  of  a  filter  whose  frequency  response  is  given  by  (47),  and  let  6(t)  be  the  impulse 
response  corresponding  to  (10).  By  choosing  finite  integers  Mi  and  Mu  such  that  < 
ui  and  ufu  <  we  can  again  express  y(t)  in  the  form  of  eq.  (48).  Because  x(t)  is 

power-dominated,  xo{i)  has  finite  power.  Hence,  (49)  impbes  that  every  im(<)  has  finite 
power.  Exploiting  this  fact  in  (48)  allows  us  to  conclude  that  y{t)  must  have  finite  power 
as  well. 

To  verify  the  spectrum  relation  (28),  we  use  (48)  together  with  the  fact  that  the  im(0 
are  uncorrelated  for  different  m  to  obtain 


ms—oo 


5a;(w)  U)L  <  lw|  <  Wy 
0  otherwise 


where  S£„(u)  denotes  the  power  spectrum  of  im(0>  where 

f;  S£„,(a;).  (63) 

m=— oo 

Again  we  have  exploited  the  fact  that  the  upper  and  lower  limits  on  the  summation  in 
(48)  may  be  extended  to  oo  and  — oo,  respectively.  The  right-hand  side  of  (63)  is,  again, 
pointwise  convergent  because  for  each  u  at  most  one  term  in  the  sum  is  non-zero.  Finally, 
exploiting  (49)  in  (63)  gives 


5xM  =  y;2-’"5i.(2-"w) 

m 

which,  as  one  can  readily  verify,  satisfies  (29)  ■ 
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